| Literature DB >> 31136610 |
Ming Zhu1, Chun Chen1, Nian Wang1, Jun Tang1, Wenxia Bao1.
Abstract
This paper focuses on fine-grained image retrieval based on sketches. Sketches capture detailed information, but their highly abstract nature makes visual comparisons with images more difficult. In spite of the fact that the existing models take into account the fine-grained details, they can not accurately highlight the distinctive local features and ignore the correlation between features. To solve this problem, we design a gradually focused bilinear attention model to extract detailed information more effectively. Specifically, the attention model is to accurately focus on representative local positions, and then use the weighted bilinear coding to find more discriminative feature representations. Finally, the global triplet loss function is used to avoid oversampling or undersampling. The experimental results show that the proposed method outperforms the state-of-the-art sketch-based image retrieval methods.Entities:
Mesh:
Year: 2019 PMID: 31136610 PMCID: PMC6538165 DOI: 10.1371/journal.pone.0217168
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Model framework.
Fig 2Attention model.
Fig 3Weighted bilinear coding model.
The architecture of CNN.
| Layer | Type | Filter Size | Filter Num | Stride | Pad | Output Size |
|---|---|---|---|---|---|---|
| Input | - | - | - | - | 225×225 | |
| Conv | 15×15 | 64 | 3 | 0 | 71×71 | |
| Relu | - | - | - | - | 71×71 | |
| Maxpool | 3×3 | - | 2 | 0 | 35×35 | |
| Conv | 5×5 | 128 | 1 | 0 | 31×31 | |
| Relu | - | - | - | - | 31×31 | |
| Maxpool | 3×3 | - | 2 | 0 | 15×15 | |
| Conv | 3×3 | 256 | 1 | 1 | 15×15 | |
| Relu | - | - | - | - | 15×15 | |
| Conv | 1×1 | 1 | 1 | 0 | 15×15 | |
| Sigmoid | - | - | - | - | 15×15 | |
| Conv | 3×3 | 256 | 1 | 1 | 15×15 | |
| Relu | - | - | - | - | 15×15 | |
| Conv | 5×5 | 128 | 1 | 0 | 31×31 | |
| Relu | - | - | - | - | 31×31 | |
| Conv | 3×3 | 256 | 1 | 1 | 15×15 | |
| Relu | - | - | - | - | 15×15 | |
| Maxpool | 3×3 | - | 2 | 0 | 7×7 | |
| Conv | 1×1 | 1 | 1 | 0 | 7×7 | |
| Sigmoid | - | - | - | - | 7×7 | |
| Bilinear pool | - | - | - | - | 1×65536 |
Fig 4Image examples of the three datasets.
Comparison with baseline on QMUL-Shoe.
| QMUL-Shoe | Acc.@1 | Acc.@10 |
|---|---|---|
| 52.17% | 92.17% | |
| 61.74% | 94.78% | |
| 65.22% | 95.65% |
Comparison with baseline on Handbag.
| Handbag | Acc.@1 | Acc.@10 |
|---|---|---|
| 39.88% | 82.14% | |
| 49.40% | 82.74% | |
| 57.74% | 90.48% |
Contributions of different component on QMUL-Shoe.
| QMUL-Shoe | Acc.@1 | Acc.@10 |
|---|---|---|
| 58.26% | 94.78% | |
| 58.26% | 96.52% | |
| 61.74% | 92.17% | |
| 64.35% | 94.78% | |
| 59.13% | 94.78% | |
| 60.87% | 92.17% | |
| 65.22% | 95.65% |
Contributions of different component on Handbag.
| Handbag | Acc.@1 | Acc.@10 |
|---|---|---|
| 52.38% | 83.33% | |
| 54.17% | 83.33% | |
| 55.36% | 86.31% | |
| 37.50% | 75.60% | |
| 52.38% | 86.31% | |
| 51.79% | 86.31% | |
| 57.74% | 90.48% |
Comparison with baseline on QMUL-Chair.
| QMUL-Chair | Acc.@1 | Acc.@10 |
|---|---|---|
| 72.16% | 98.96% | |
| 81.44% | 95.88% | |
| 87.63% | 97.94% |
Contributions of different component on QMUL-Chair.
| QMUL-Chair | Acc.@1 | Acc.@10 |
|---|---|---|
| 84.41% | 96.91% | |
| 86.60% | 96.91% | |
| 82.47% | 96.91% | |
| 82.47% | 98.97% | |
| 84.54% | 96.91% | |
| 85.57% | 97.94% | |
| 87.63% | 97.94% |