| Literature DB >> 35590933 |
Shicheng Sun1, Yu Gu1, Mengjun Ren1.
Abstract
Ship recognition is a fundamental and essential step in maritime activities, and it can be widely used in maritime rescue, vessel management, and other applications. However, most studies conducted in this area use synthetic aperture radar (SAR) images and space-borne optical images, and those studies utilizing visible images are limited to the coarse-grained level. In this study, we constructed a fine-grained ship dataset with real images and simulation images that consisted of five categories of ships. To solve the problem of low accuracy in fine-grained ship classification with different angles in visible images, a network based on domain adaptation and a transformer was proposed. Concretely, style transfer was first used to reduce the gap between the simulation images and real images. Then, with the goal of utilizing the simulation images to execute classification tasks on the real images, a domain adaptation network based on local maximum mean discrepancy (LMMD) was used to align the different domain distributions. Furthermore, considering the innate attention mechanism of the transformer, a vision transformer (ViT) was chosen as the feature extraction module to extract the fine-grained features, and a fully connected layer was used as the classifier. Finally, the experimental results showed that our network had good performance on the fine-grained ship dataset with an overall accuracy rate of 96.0%, and the mean average precision (mAP) of detecting first and then classifying with our network was 87.5%, which also verified the feasibility of using images generated by computer simulation technology for auxiliary training.Entities:
Keywords: computer simulation; domain adaptation; fine-grained ship recognition; local maximum mean discrepancy; vision transformer
Mesh:
Year: 2022 PMID: 35590933 PMCID: PMC9104243 DOI: 10.3390/s22093243
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Real ship images.
Figure 23D modeling of ship target.
Figure 3Ship image generation process based on computer simulation.
Figure 4Simulation ship images.
Dataset information.
| Dataset | Class | Arl. | Mur. | Nim. | Tic. | Wasp | Total |
|---|---|---|---|---|---|---|---|
| Simulation | 5 | 2160 | 2160 | 2160 | 2160 | 2160 | 10,800 |
| Real | 5 | 545 | 497 | 519 | 502 | 505 | 2568 |
| Test | 5 | 48 | 45 | 47 | 43 | 43 | 226 |
Figure 5Target size scatter plot for the (a) simulation dataset, (b) real dataset, and (c) test dataset.
Figure 6Training process.
Figure 7The architecture of training recognition model based on domain adaptation.
Figure 8ViT model overview.
Figure 9Style images and results. (a) Some style images selected from real samples. (b) Original simulation images and results processed by style transfer.
Training parameters of recognition model.
| Backbone | Loss | Epoch | Optimizer | Batch Size | Learn Rate | Image Size |
|---|---|---|---|---|---|---|
| ViT-B_16 | CE + LMMD + Contrastive | 100 | SGD (m = 0.9) | 16 | 1 × 10−2 | 224 × 224 |
Test results on ship fine-grained dataset.
| ID * | Backbone | Training Strategy | Style Transfer | OA/% |
|---|---|---|---|---|
| 1 | ResNet50 | Only real dataset | 80.5 | |
| 2 | ResNet50 | Directly mix datasets | 76.1 | |
| 3 | ResNet50 | Directly mix datasets | √ | 80.0 |
| 4 | ResNet50 | Domain adaptation (global alignment) | 90.7 | |
| 5 | ResNet50 | Domain adaptation (global alignment) | √ | 92.4 |
| 6 | ResNet50 | Domain adaptation (sub-domain alignment) | 92.5 | |
| 7 | ResNet50 | Domain adaptation (sub-domain alignment) | √ | 93.4 |
| 8 | ViT-B_16 | Only real dataset | 85.4 | |
| 9 | ViT-B_16 | Directly mix datasets | 81.4 | |
| 10 | ViT-B_16 | Directly mix datasets | √ | 84.5 |
| 11 | ViT-B_16 | Domain adaptation (global alignment) | 93.8 | |
| 12 | ViT-B_16 | Domain adaptation (global alignment) | √ | 95.6 |
| 13 | ViT-B_16 | Domain adaptation (sub-domain alignment) | 95.2 | |
| 14 | ViT-B_16 | Domain adaptation (sub-domain alignment) | √ |
|
* Table 3 and Table 4 use the same method for corresponding ID.
Accuracy of each class and average accuracy.
| ID * | Arl./% | Mur./% | Nim./% | Tic./% | Wasp/% | AA/% |
|---|---|---|---|---|---|---|
| 1 | 89.6 | 82.2 | 61.7 | 69.8 |
| 80.7 |
| 2 | 62.5 | 73.3 | 68.1 | 81.4 | 97.7 | 76.6 |
| 3 | 81.3 | 80.0 | 63.8 | 76.7 |
| 80.4 |
| 4 | 95.8 | 82.2 | 95.7 | 79.1 |
| 90.6 |
| 5 | 95.8 | 86.7 | 93.6 | 86.0 |
| 92.4 |
| 6 |
| 84.4 | 91.5 | 93.0 | 95.3 | 92.9 |
| 7 |
| 86.7 |
| 79.1 |
| 93.1 |
| 8 | 75.0 | 80.0 | 97.9 | 90.7 | 83.7 | 85.5 |
| 9 | 77.1 | 60.0 | 95.7 | 83.7 | 90.7 | 81.4 |
| 10 | 72.9 | 95.6 | 95.7 | 72.1 | 86.1 | 84.5 |
| 11 | 83.3 | 93.3 |
| 95.3 | 97.7 | 93.9 |
| 12 | 89.6 | 93.3 | 97.9 | 97.7 |
| 95.7 |
| 13 | 81.3 | 95.6 |
|
|
| 95.4 |
| 14 | 85.4 |
|
| 97.7 |
|
|
* Table 3 and Table 4 use the same method for corresponding ID.
Figure 10Misclassification examples: (a) misclassifying Ticonderoga as Arleigh Burke; (b) misclassifying Arleigh Burke as Murasame; (c) misclassifying Arleigh Burke as Murasame.
Results of direct detection and first detection then classification.
| Method | Arl./% | Mur./% | Nim./% | Tic./% | Wasp/% | mAP/% |
|---|---|---|---|---|---|---|
| YOLOv5s | 81.6 | 81.5 | 61.5 | 58.2 | 78.7 | 72.3 |
| Cascade R-CNN | 86.5 | 88.6 | 62.4 | 63.0 | 86.1 | 77.3 |
| YOLOv5s + Ours | 82.1 | 87.6 | 89.6 | 81.4 | 96.9 | 87.5 |
Figure 11Detection and classification results.