| Literature DB >> 36081056 |
Xiaomei Gong1, Yuxin Zhou1, Yi Zhang1.
Abstract
As a prevailing solution for visual tracking, Siamese networks manifest high performance via convolution neural networks and weight-sharing schemes. Most existing Siamese networks have adopted various offline training strategies to realize precise tracking by comparing the extracted target features with template features. However, their performances may degrade when dealing with unknown targets. The tracker is unable to learn background information through offline training, and it is susceptible to background interference, which finally leads to tracking failure. In this paper, we propose a twin-branch architecture (dubbed SiamOT) to mitigate the above problem in existing Siamese networks, wherein one branch is a classical Siamese network, and the other branch is an online training branch. Especially, the proposed online branch utilizes feature fusion and attention mechanism, which is able to capture and update both the target and the background information so as to refine the description of the target. Extensive experiments have been carried out on three mainstream benchmarks, along with an ablation study, to validate the effectiveness of SiamOT. It turns out that SiamOT achieves superior performance with stronger target discrimination abilities.Entities:
Keywords: Siamese networks; online training; visual tracking
Mesh:
Year: 2022 PMID: 36081056 PMCID: PMC9460846 DOI: 10.3390/s22176597
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Comparative results among different trackers.
Figure 2The overall structure of SiamOT.
Figure 3The online training (OT) branch of our network.
Figure 4The architecture of the proposed attention module in SiamOT.
Experimental environment.
| Hardware | GPU | Nvidia RTX 3090 |
| CPU | Intel® Core™ i7-9700 K | |
| Software | OS | Ubuntu 20.04 |
| Framework | Pytorch 1.7.0 | |
| Language | Python |
Comparative results on OTB100.
| Trackers | SUCC. | PREC. |
|---|---|---|
| GradNet [ | 0.639 | 0.861 |
| UpdateNet [ | 0.647 | 0.861 |
| ATOM [ | 0.671 | 0.882 |
| MDNet [ | 0.678 | 0.909 |
| DSDCF [ | 0.667 | 0.784 |
| DiMP [ | 0.688 | 0.900 |
| ECO [ | 0.691 | 0.910 |
| DaSiamRPN [ | 0.658 | 0.880 |
| SiamFC [ | 0.587 | 0.722 |
| SiamFC++ [ | 0.682 | 0.896 |
| SiamBAN [ | 0.696 | 0.910 |
| SiamRPN++ [ | 0.696 | 0.915 |
| SiamRCNN [ | 0.700 | 0.891 |
| DROL-RPN [ | 0.715 | 0.934 |
| Ours | 0.707 | 0.922 |
Figure 5Comparative results of success rates and precision on OTB100.
Figure 6Comparisons of the attributes on OTB100.
Figure 7Qualitative comparison of our tracker with other popular trackers on 4 video sequences from OTB100.
Figure 8Comparison of tracking performance on LaSOT.
Comparative results on NFS.
| Trackers | SUCC. |
|---|---|
| UPDT [ | 0.537 |
| MDNet [ | 0.422 |
| C-COT [ | 0.488 |
| ECO [ | 0.466 |
| ATOM [ | 0.584 |
| DiMP [ | 0.620 |
| SiamBAN [ | 0.594 |
| Ours | 0.630 |
Comparison of different feature fusion methods.
| Fusion Strategy | SUCC. | PREC. |
|---|---|---|
| Addition | 0.704 | 0.916 |
| Concatenation | 0.706 | 0.915 |
| Multiplication | 0.704 | 0.920 |
| Weighted Fusion | 0.707 | 0.922 |
Comparison of results of different attention mechanisms on NFS.
| Attention Mechanism | SUCC. |
|---|---|
| None | 0.503 |
| Self-Attention | 0.565 |
| Self-Attention + Spatial Attention | 0.589 |
| Self-Attention + Channel Attention | 0.591 |
| Self-Attention + Spatial Attention + Channel Attention | 0.630 |
The comparative results of success and precision using different weights.
|
| SUCC. | PREC. |
|---|---|---|
| 0.1 | 0.693 | 0.907 |
| 0.2 | 0.688 | 0.896 |
| 0.3 | 0.698 | 0.909 |
| 0.4 | 0.704 | 0.919 |
| 0.5 | 0.705 | 0.911 |
| 0.6 | 0.706 | 0.921 |
| 0.7 | 0.701 | 0.912 |
| 0.8 | 0.707 | 0.922 |
| 0.9 | 0.701 | 0.915 |