| Literature DB >> 31948002 |
Pingping Liu1,2,3, Guixia Gou1, Xue Shan1, Dan Tao4, Qiuzhan Zhou5.
Abstract
A rich line of works focus on designing elegant loss functions under the deep metric learning (DML) paradigm to learn a discriminative embedding space for remote sensing image retrieval (RSIR). Essentially, such embedding space could efficiently distinguish deep feature descriptors. So far, most existing losses used in RSIR are based on triplets, which have disadvantages of local optimization, slow convergence and insufficient use of similarity structure in a mini-batch. In this paper, we present a novel DML method named as global optimal structured loss to deal with the limitation of triplet loss. To be specific, we use a softmax function rather than a hinge function in our novel loss to realize global optimization. In addition, we present a novel optimal structured loss, which globally learn an efficient deep embedding space with mined informative sample pairs to force the positive pairs within a limitation and push the negative ones far away from a given boundary. We have conducted extensive experiments on four public remote sensing datasets and the results show that the proposed global optimal structured loss with pairs mining scheme achieves the state-of-the-art performance compared with the baselines.Entities:
Keywords: convolutional neural network; deep metric learning; global optimization; remote sensing image retrieval
Year: 2020 PMID: 31948002 PMCID: PMC6983082 DOI: 10.3390/s20010291
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The optimization process under the proposed global optimal structured loss. The circles with different colors denote the samples with different label. The left part is the original distribution of sample pairs. The blue circle with small white circle in the center is the anchor, the green circle with small black circle in the center is the hardest negative sample to the anchor and the similarity of them is , the blue circle with small purple circle in the center is the hardest positive samples to the anchor and the similarity of them is . We use pairs mining strategy to sample more informative pairs for optimization. The black solid line is the negative border for negative pairs mining and the black dot line is the positive border for positive pairs mining. The cycles with arrow denote the mined informative samples and the arrows are the gradient direction. The right part is distribution optimization. The blue solid line is positive boundary used to limit positive pairs within a hypersphere. The blue dot line is negative boundary used to pull negative pairs far away from anchor.
Figure 2The RSIR framework based on the global optimal structured loss. The upper part denotes training stage and we fine-tune the pre-trained network with our global optimal structured loss. We utilize the fine-tuned network for more discriminative feature representations extraction. The bottom part is testing stage. The query image and the testing set would be input in the fine-tuned network, and the top K similar images would be returned.
The detail introduction of baselines.
| Baseline | Feature Representations | Representation Size |
|---|---|---|
| DN7 [ | Convolutional | 4096 |
| DN8 [ | Convolutional | 4096 |
| ResNet50 [ | Convolutional + VLAD | 1500 |
| DBOW [ | Convolutional + BoW | 16,384 |
| ADLF [ | Convolutional + VLAD | 16,384 |
AveP (%) evaluation on four different remote sensing datasets, the best results would be bolded.
| Method | UCMD | SATREM | SIRI | NWPU |
|---|---|---|---|---|
| DN7 [ | 70.4 | 74.0 | 70.0 | 60.5 |
| DN8 [ | 70.5 | 74.0 | 69.6 | 59.5 |
| ResNet50 [ | 81.6 | 76.4 | 86.2 | 79.8 |
| DBOW [ | 83.0 |
| 92.6 | 82.1 |
| ADLF [ |
| 89.5 | 83.8 | 85.7 |
| GOSLm | 85.8 | 91.1 |
|
|
Precision (%) of 21 geographic categories in UCMD with various RSIR methods. The best results would be highlighted in bold.
| Categories | DN7 [ | DN8 [ | ResNet50 [ | DBOW [ | ADLF [ | GOSLm |
|---|---|---|---|---|---|---|
| Agriculture | 94 | 93 | 85 | 92 | 80 |
|
| Airplane | 74 | 75 | 93 | 95 |
| 82 |
| Baseball | 78 | 77 | 73 | 87 | 77 |
|
| Beach | 94 | 97 |
| 88 | 94 | 92 |
| Buildings | 51 | 47 | 74 |
| 85 | 78 |
| Chaparral | 98 | 98 | 95 | 94 |
| 95 |
| Dense | 36 | 33 | 62 | 96 | 90 | 55 |
| Forest | 98 | 98 | 87 |
| 98 | 95 |
| Freeway | 72 | 71 | 69 | 78 |
| 83 |
| Golf | 63 | 65 | 73 | 85 | 83 |
|
| Harbor | 85 | 84 | 97 | 95 |
| 95 |
| Intersection | 65 | 61 | 81 | 77 |
| 80 |
| Medium-density | 66 | 60 | 80 | 74 |
| 59 |
| Mobile | 66 | 65 | 74 | 76 |
| 80 |
| Overpass | 57 | 60 | 97 | 86 |
| 78 |
| Parking | 92 | 90 | 92 | 67 |
| 95 |
| River | 48 | 51 | 66 | 74 |
| 86 |
| Runway | 87 | 83 | 93 | 66 |
| 91 |
| Sparse | 67 | 78 | 69 | 79 | 79 |
|
| Storage | 40 | 45 | 86 | 50 | 93 |
|
| Tennis | 48 | 53 | 70 | 94 | 94 |
|
| Average | 70.4 | 70.5 | 81.6 | 83.0 |
| 85.8 |
Precision (%) of 20 geographic categories in SATREM with various RSIR methods. The best results would be highlighted in bold.
| Categories | DN7 [ | DN8 [ | ResNet50 [ | DBOW [ | ADLF [ | GOSLm |
|---|---|---|---|---|---|---|
| Agriculture | 85 | 85 | 86 |
| 90 | 92 |
| Airplane | 64 | 64 | 86 | 96 | 88 |
|
| Artificial | 74 | 78 | 93 | 97 | 81 |
|
| Beach | 68 | 66 | 86 | 95 | 87 |
|
| Buildings | 74 | 71 | 92 |
| 94 | 94 |
| Chaparral | 71 | 69 | 79 | 96 | 90 |
|
| Cloud |
|
| 97 | 99 | 97 |
|
| Container | 72 | 74 | 97 | 96 |
| 92 |
| Dense | 87 | 85 | 89 |
| 94 | 92 |
| Factory | 59 | 58 | 69 |
| 74 | 72 |
| Forest | 94 | 93 | 89 | 96 | 95 |
|
| Harbor | 60 | 65 | 80 |
| 96 |
|
| Medium-density | 68 | 66 | 67 |
| 67 | 53 |
| Ocean | 95 | 94 | 91 | 92 | 92 |
|
| Parking | 69 | 63 | 87 | 95 |
| 88 |
| River | 60 | 63 |
| 71 | 74 |
|
| Road | 64 | 60 | 85 | 82 |
| 90 |
| Runway | 84 | 82 | 96 | 86 |
|
|
| Sparse | 69 | 75 | 75 |
| 85 | 78 |
| Storage | 63 | 70 | 98 | 91 |
| 99 |
| Average | 74.0 | 74.0 | 86.2 |
| 89.5 | 91.1 |
Precision (%) of 12 geographic categories in SIRI with various RSIR methods. The best results would be highlighted in bold.
| Categories | DN7 [ | DN8 [ | ResNet50 [ | DBOW [ | ADLF [ | GOSLm |
|---|---|---|---|---|---|---|
| Agriculture | 82 | 79 | 95 | 99 | 94 |
|
| Commercial | 80 | 80 | 90 | 99 | 97 |
|
| Harbor | 55 | 56 | 63 | 89 | 74 |
|
| Idle | 58 | 60 | 63 | 97 | 80 |
|
| Industrial | 72 | 70 | 88 | 90 | 96 |
|
| Meadow | 71 | 63 | 77 | 93 | 82 |
|
| Overpass | 71 | 76 | 80 | 89 | 94 |
|
| Park | 67 | 67 | 82 | 87 | 90 |
|
| Pond | 47 | 50 | 57 |
| 74 | 96 |
| Residential | 81 | 78 | 84 | 97 | 94 |
|
| River | 59 | 57 | 44 |
| 69 | 77 |
| Water | 99 | 99 | 94 | 86 | 99 |
|
| Average | 69.9 | 69.5 | 76.4 | 92.6 | 86.9 | 96.6 |
Precision (%) of 45 geographic categories in NWPU with various RSIR methods. The best results would be highlighted in bold.
| Categories | DN7 [ | DN8 [ | ResNet50 [ | DBOW [ | ADLF [ | GOSLm |
|---|---|---|---|---|---|---|
| Airplane | 56 | 57 | 88 |
| 93 | 96 |
| Airport | 50 | 47 | 72 |
| 81 | 90 |
| Baseball Diamond | 43 | 45 | 69 | 86 | 64 |
|
| Basketball Court | 33 | 32 | 61 | 83 | 71 |
|
| Beach | 56 | 58 | 77 | 85 | 83 |
|
| Bridge | 67 | 66 | 73 |
| 81 | 93 |
| Chaparral | 93 | 93 | 98 | 96 |
| 98 |
| Church | 25 | 26 | 56 |
| 64 | 64 |
| Circular Farmland | 83 | 84 | 97 | 94 |
| 97 |
| Cloud | 91 | 91 | 92 |
|
|
|
| Commercial Area | 53 | 45 | 82 | 79 |
| 78 |
| Dense Residential | 62 | 58 | 89 | 90 |
| 92 |
| Desert | 85 | 83 | 87 |
| 92 | 90 |
| Forest | 91 | 89 | 95 | 95 |
| 94 |
| Freeway | 55 | 52 | 65 | 64 | 86 |
|
| Golf Course | 63 | 60 | 96 | 82 |
| 96 |
| Ground Track Field | 59 | 61 | 63 | 80 | 77 |
|
| Harbor | 64 | 65 | 93 | 88 | 97 |
|
| Industrial Area | 57 | 52 | 75 | 85 | 88 |
|
| Intersection | 57 | 51 | 64 | 80 | 72 |
|
| Island | 78 | 73 | 88 | 88 |
| 93 |
| Lake | 69 | 69 | 80 | 85 | 85 |
|
| Meadow | 82 | 82 | 84 | 90 |
|
|
| Medium Residential | 57 | 51 | 78 |
| 77 | 82 |
| Mobile Home Park | 52 | 52 | 93 | 83 |
| 94 |
| Mountain | 74 | 71 | 88 | 95 |
| 86 |
| Overpass | 51 | 53 | 87 | 74 | 90 |
|
| Palace | 25 | 23 | 41 |
| 56 | 51 |
| Parking Lot | 71 | 68 | 95 | 70 | 97 |
|
| Railway | 60 | 58 | 88 | 84 |
| 77 |
| Railway Station | 48 | 46 | 62 | 86 | 73 | 81 |
| Rectangular Farmland | 71 | 66 | 82 | 66 |
| 86 |
| River | 50 | 50 | 70 | 76 | 75 |
|
| Roundabout | 61 | 61 | 72 | 83 | 90 |
|
| Runway | 63 | 58 | 80 | 78 | 89 |
|
| Sea Ice | 91 | 89 | 98 | 90 |
|
|
| Ship | 43 | 46 | 61 | 65 | 69 |
|
| Snowberg | 78 | 79 | 97 | 83 | 98 |
|
| Sparse Residential | 58 | 62 | 69 | 84 | 70 |
|
| Stadium | 59 | 57 | 81 | 57 | 86 |
|
| Storage Tank | 61 | 62 | 88 | 48 | 94 |
|
| Tennis Court | 34 | 37 | 80 | 72 | 78 |
|
| Terrace | 54 | 54 | 88 | 76 |
| 89 |
| Thermal Power Station | 43 | 45 | 68 | 72 | 78 |
|
| Wetland | 50 | 49 | 82 | 70 | 80 |
|
| Average | 60.5 | 59.4 | 79.8 | 82.1 | 85.7 |
|
AveP (%) evaluated on four different remote sensing datasets. The best results would be bold.
| Method | UCMD | SATREM | SIRI | NWPU |
|---|---|---|---|---|
| N-pairs | 82.2 | 85.3 | 92.8 | 84.3 |
| GLSL | 82.6 | 85.1 | 94.9 | 85.5 |
| GLSLm | 84.3 | 87.2 | 95.2 | 88.6 |
| GOSL | 85.1 | 86.8 | 95.3 | 85.8 |
| GOSLm |
|
|
|
|
Recall@K (%) evaluated on UCMD. The best results would be bold.
| Recall@K (%) | 1 | 2 | 4 | 8 | 16 | 32 |
|---|---|---|---|---|---|---|
| N-pairs | 95.3 | 98.3 | 98.5 |
|
|
|
| GLSL | 94.2 | 96.1 | 96.9 | 98.3 | 98.3 | 99.5 |
| GLSLm | 94.7 | 96.4 | 97.1 | 97.6 | 98.1 |
|
| GOSL | 95.4 | 98.1 | 98.3 | 98.5 | 99.0 |
|
| GOSLm |
|
|
|
|
|
|
Recall@K (%) evaluated on SATREM. The best results would be bold.
| Recall@K (%) | 1 | 2 | 4 | 8 | 16 | 32 |
|---|---|---|---|---|---|---|
|
| 93.6 | 95.6 | 97.5 | 98.6 | 99.3 | 99.8 |
|
| 92.8 | 96.5 | 97.3 | 98.3 | 99.3 | 99.6 |
|
| 94.5 | 97.1 | 98.6 | 99.5 | 99.6 | 99.6 |
|
| 93.3 | 96.0 | 98.0 | 98.5 | 99.3 | 99.6 |
|
|
|
|
|
|
|
|
Recall@K (%) evaluated on SIRI. The best results would be bold.
| Recall@K (%) | 1 | 2 | 4 | 8 | 16 | 32 |
|---|---|---|---|---|---|---|
| N-pairs | 95.0 | 96.0 | 96.8 | 97.7 | 98.5 | 99.5 |
| GLSL | 95.4 | 96.2 | 97.5 | 98.1 | 98.9 | 98.9 |
| GLSLm | 95.8 | 96.4 | 96.8 | 98.1 | 98.5 | 99.5 |
| GOSL | 96.0 | 96.6 | 97.2 | 97.5 | 97.9 | 98.7 |
| GOSLm |
|
|
|
|
|
|
Recall@K (%) evaluated on NWPU. The best results would be bold.
| Recall@K (%) | 1 | 2 | 4 | 8 | 16 | 32 |
|---|---|---|---|---|---|---|
| N-pairs | 87.3 | 92.5 | 95.1 | 96.9 | 98.0 |
|
| GLSL | 87.2 | 91.0 | 93.0 | 94.5 | 95.3 | 96.0 |
| GLSLm | 90.3 | 93.6 | 95.8 | 97.1 | 98.0 | 98.5 |
| GOSL | 87.4 | 91.2 | 93.3 | 94.8 | 95.7 | 96.1 |
| GOSLm |
|
|
|
|
|
|
Figure 3Six retrieval cases with top-10 returned results on UCMD. The left part represents three easy retrieval cases and the right part represents three hard retrieval cases. For each retrieval case, the top, middle and bottom rows denote the results obtained by using the methods of our GOSLm, GLSLm, and N-pairs. The green and red border denote true and false retrieve results respectively.
The AveP (%) on different with on SIRI-WHU with . The best results would be highlighted in bold.
|
| 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
|---|---|---|---|---|---|---|
| 96.3 |
| 96.1 | 96.0 | 95.8 | 95.7 |
The AveP (%) on different with on SIRI-WHU with . The best results would be bold.
|
| 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 |
|---|---|---|---|---|---|---|
| 95.4 | 95.6 | 95.6 | 95.8 |
| 96.0 |
AveP (%) comparison on our proposed method with embedding size at {64, 128, 256, 512, 1024}. The best results would be highlighted in bold.
| 64 | 128 | 256 | 512 | 1024 | |
|---|---|---|---|---|---|
| UCMD | 84.4 | 85.0 | 85.1 |
| 85.6 |
| SATREM | 85.2 | 85.6 | 86.8 |
| 86.9 |
| SIRI | 95.2 | 95.9 | 96.0 |
| 95.9 |
| NWPU | 87.9 | 88.2 | 88.6 |
| 88.8 |
AveP (%) comparison on our proposed method with batch size at {10, 20, 40, 60, 100, 160}. The “-” denotes the related results are invalid. The best results would be bold.
| 10 | 20 | 40 | 60 | 100 | 160 | |
|---|---|---|---|---|---|---|
| UCMD | 84.7 | 85.7 |
| 85.6 | 85.5 | - |
| SATREM | 86.5 | 88.3 |
| 86.5 | 86.1 | - |
| SIRI | 95.5 | 95.6 |
| 95.5 | - | - |
| NWPU | 83.9 | 87.3 |
| 88.1 | 88.4 | 85.9 |
Retrieval time (in milliseconds) with various test datasets and embedding size. The best results would be in bold.
| DB Size | DN7 [ | DN8 [ | DBOW [ | ADLF (1024) [ | ADLF (512) [ | ADLF (256) [ | GOSLm (1024) | GOSLm (512) | GOSLm (256) |
|---|---|---|---|---|---|---|---|---|---|
| 50 | 5.80 | 5.70 | 2.30 | 1.70 | 0.97 | 0.61 | 0.34 | 0.29 |
|
| 100 | 17.10 | 17.30 | 6.10 | 3.31 | 3.43 | 1.85 | 0.89 | 0.46 |
|
| 200 | 58.70 | 58.40 | 21.40 | 11.54 | 11.13 | 6.43 | 1.90 | 0.72 |
|
| 300 | 127.40 | 127.80 | 45.90 | 28.18 | 16.56 | 10.72 | 2.59 | 1.32 |
|
| 400 | 223.10 | 224.30 | 79.60 | 49.01 | 29.72 | 14.87 | 3.37 | 1.60 |
|
| 500 | 246.00 | 344.90 | 123.90 | 77.83 | 44.90 | 22.98 | 4.20 | 2.35 |
|