| Literature DB >> 22163524 |
Yongjian Chen1, Tao Guan, Cheng Wang.
Abstract
A recently proposed product quantization method is efficient for large scale approximate nearest neighbor search, however, its performance on unstructured vectors is limited. This paper introduces residual vector quantization based approaches that are appropriate for unstructured vectors. Database vectors are quantized by residual vector quantizer. The reproductions are represented by short codes composed of their quantization indices. Euclidean distance between query vector and database vector is approximated by asymmetric distance, i.e., the distance between the query vector and the reproduction of the database vector. An efficient exhaustive search approach is proposed by fast computing the asymmetric distance. A straight forward non-exhaustive search approach is proposed for large scale search. Our approaches are compared to two state-of-the-art methods, spectral hashing and product quantization, on both structured and unstructured datasets. Results show that our approaches obtain the best results in terms of the trade-off between search quality and memory usage.Entities:
Keywords: approximate nearest neighbor search; high-dimensional indexing; residual vector quantization
Mesh:
Year: 2010 PMID: 22163524 PMCID: PMC3231071 DOI: 10.3390/s101211259
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Block diagrams of two-stages residual vector quantization. (a) Learning codebooks; (b) Quantizing a vector.
Dataset information.
| Dimension of descriptor | 128 | 960 | 128 |
| Size of learning set | 100,000 | 500,000 | 1,491 |
| Size of database set | 1,000,000 | 1,000,000 | 1,491 |
| Size of query set | 10,000 | 1,000 | 500 |
Figure 2.Quantization error associated with K and L. (left) SIFT dataset; (right) GIST dataset.
Figure 3.Exhaustive search accuracy. (left) SIFT dataset. (right) GIST dataset.
Figure 4.RDE for SIFT dataset, exhaustive search method. (left) mean of RDE. (right) standard variance of RDE.
Figure 5.Search accuracy of non-exhaustive search. (left) SIFT dataset. (right) GIST dataset.
Comparison of RVQ and IVFRVQ on SIFT dataset.
| RVQ | 34 | 1,000,000 | 0.96 | |
| IVFRVQ | 0.65 | 4,261 | 0.56 | |
| 3.2 | 1,682 | 0.80 | ||
| 15.1 | 9,692 | 0.96 |
Comparison of RVQ and IVFRVQ on GIST dataset.
| RVQ | 36.1 | 1,000,000 | 0.67 | |
| IVFRVQ | 2.9 | 5,205 | 0.36 | |
| 5.7 | 2,423 | 0.55 | ||
| 20.5 | 16,512 | 0.74 |
Figure 6.Comparison of search accuracies obtained by spectral hashing, product quantization methods and our approaches. (left) SIFT dataset, 64-bit codes. (right) GIST dataset, 64-bit codes.
Comparison with state of the art on VLAD dataset.
| SH | 0.255 | 0.349 | 0.397 |
| PQ | 0.337 | 0.409 | 0.457 |
| RVQ | |||
Search speed for 64-bit code and different methods (SIFT dataset).
| RVQ | 34 | 1,000,000 | 0.96 | |
| 33,602 | ||||
| PQ | 33.7 | 1,000,000 | 0.93 | |
| IVFPQ | 3 | 9,102 | 0.87 | |
| IVFPQ | 7.3 | 17,621 | 0.93 | |
| SH | 35.3 | 1,000,000 | 0.53 |