| Literature DB >> 36245795 |
Di Yuan1, Xiu Shu2, Qiao Liu3, Xinming Zhang2, Zhenyu He4.
Abstract
When dealing with complex thermal infrared (TIR) tracking scenarios, the single category feature is not sufficient to portray the appearance of the target, which drastically affects the accuracy of the TIR target tracking method. In order to address these problems, we propose an adaptively multi-feature fusion model (AMFT) for the TIR tracking task. Specifically, our AMFT tracking method adaptively integrates hand-crafted features and deep convolutional neural network (CNN) features. In order to accurately locate the target position, it takes advantage of the complementarity between different features. Additionally, the model is updated using a simple but effective model update strategy to adapt to changes in the target during tracking. In addition, a simple but effective model update strategy is adopted to adapt the model to the changes of the target during the tracking process. We have shown through ablation studies that the adaptively multi-feature fusion model in our AMFT tracking method is very effective. Our AMFT tracker performs favorably on PTB-TIR and LSOTB-TIR benchmarks compared with state-of-the-art trackers.Entities:
Keywords: Model update; Multi-feature fusion; Thermal infrared tracking
Year: 2022 PMID: 36245795 PMCID: PMC9553631 DOI: 10.1007/s00521-022-07867-1
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Fig. 1Tracking examples. AMFT_H represents the tracking results with only hand-crafted features, while AMFT_D represents the tracking results with only deep convolutional neural networks features
Fig. 2The overview of the proposed multi-feature fusion model for the TIR target tracking
Ablation studies on PTB-TIR [22] benchmark
| Trackers | Hand-crafted feature | Deep feature | Precision (%) | AUC (%) | Speed (fps) |
|---|---|---|---|---|---|
| AMFT_H | 78.0 | 58.9 | 10.6 | ||
| AMFT_D | 74.4 | 54.5 | 7.8 | ||
| AMFT | 81.1 | 61.1 | 7.2 |
Fig. 3Experimental comparison on PTB-TIR [22] benchmark
Fig. 4Comparison results of tracking speed and tracking accuracy on the LSOTB-TIR [23] benchmark
Fig. 5Experimental comparison on PTB-TIR [22] benchmark for some attributes
Fig. 6Experimental comparison on LSOTB-TIR [23] benchmark
Success scores (%) comparison on the LSOTB-TIR [23] benchmark for 12 different attributes, which include the scale variation (SV), fast motion (FM), motion blur (MB), distractor (DIS), low resolution (LR), intensity variation (IV), out-of-view (OV), background clutter (BC), deformation(DEF), aspect ratio variation (ARV), occlusion (OCC), and thermal crossover (TC)
| Trackers | STAMT | TADT | MLSSNet | MCCT | GFSDCF | SRDCF | CREST | Staple | HSSNet | HDT | HCF | CFNet | SiamFC | SiamTri | MDNet | UDT | MCFTS | VITAL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ours | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | |
| SV | 61.9 | 47.9 | 56.4 | 45.3 | 53.3 | 40.2 | 32.4 | 32.0 | 48.9 | 54.5 | 52.5 | 62.0 | 57.6 | 47.9 | ||||
| FM | 45.8 | 50.2 | 51.4 | 51.5 | 44.5 | 40.3 | 44.3 | 41.6 | 39.7 | 57.4 | 56.8 | 57.3 | 53.4 | 47.8 | 58.6 | |||
| MB | 58.8 | 41.5 | 46.9 | 52.2 | 50.8 | 44.9 | 37.4 | 42.0 | 36.9 | 37.4 | 54.5 | 56.1 | 57.0 | 48.6 | 40.4 | |||
| DIS | 57.7 | 46.6 | 55.8 | 56.1 | 50.5 | 53.8 | 40.1 | 41.2 | 40.8 | 40.6 | 46.5 | 46.4 | 54.5 | 48.0 | 57.7 | |||
| LR | 65.0 | 51.8 | 53.6 | 58.1 | 47.7 | 49.4 | 45.8 | 38.9 | 37.1 | 46.2 | 62.3 | 63.7 | 57.9 | 36.8 | 60.9 | |||
| IV | 38.4 | 45.5 | 64.3 | 47.6 | 34.1 | 33.8 | 28.1 | 37.5 | 37.6 | 46.3 | 41.9 | 61.5 | 50.2 | 52.0 | 61.8 | |||
| OV | 47.9 | 54.1 | 56.9 | 56.5 | 49.3 | 40.7 | 45.0 | 43.0 | 45.9 | 56.0 | 56.6 | 58.2 | 55.4 | 51.3 | 59.7 | |||
| BC | 56.7 | 45.8 | 52.5 | 54.0 | 46.9 | 51.7 | 41.3 | 38.5 | 39.5 | 44.6 | 49.6 | 49.4 | 47.5 | 45.9 | 58.4 | |||
| DEF | 55.7 | 41.7 | 47.2 | 56.8 | 46.5 | 52.0 | 44.1 | 37.9 | 43.2 | 43.2 | 33.8 | 51.8 | 51.8 | 48.3 | 45.0 | |||
| ARV | 51.3 | 43.2 | 40.7 | 42.8 | 45.5 | 40.0 | 43.1 | 40.3 | 40.9 | 40.7 | 49.6 | 50.2 | 50.6 | 47.5 | 54.8 | |||
| OCC | 55.9 | 44.6 | 51.6 | 56.5 | 47.3 | 50.8 | 48.4 | 38.0 | 41.3 | 42.8 | 38.6 | 48.9 | 48.4 | 51.4 | 48.4 | |||
| TC | 51.5 | 51.7 | 32.1 | 49.5 | 42.5 | 48.4 | 48.5 | 35.0 | 41.7 | 46.0 | 34.8 | 47.7 | 43.4 | 41.7 | 43.5 |
Success scores (%) comparison on the LSOTB-TIR [23] benchmark for 4 different scenarios, which include the handheld camera (HH), vehicle-mounted camera (VM), drone-mounted camera (DM), and surveillance camera(VS)
| Trackers | AMFT | TADT | MLSSNet | MCCT | GFSDCF | SRDCF | CREST | Staple | HSSNet | HDT | HCF | CFNet | SiamFC | SiamTri | MDNet | UDT | MCFTS | VITAL |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ours | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | [ | |
| HH | 41.1 | 44.8 | 56.5 | 48.0 | 51.9 | 42.4 | 39.1 | 42.2 | 40.8 | 33.9 | 54.0 | 55.3 | 56.8 | 42.2 | 44.0 | |||
| VM | 68.6 | 54.2 | 67.4 | 45.2 | 66.3 | 44.8 | 35.8 | 31.4 | 58.7 | 56.2 | 52.0 | 71.7 | 57.9 | 72.1 | ||||
| DM | 53.4 | 44.4 | 44.9 | 43.8 | 47.3 | 42.0 | 36.9 | 39.6 | 43.3 | 38.4 | 53.0 | 51.6 | 43.1 | 45.9 | 54.5 | |||
| VS | 47.1 | 55.5 | 57.6 | 52.2 | 53.6 | 51.0 | 42.9 | 41.4 | 42.8 | 41.7 | 46.6 | 47.2 | 52.7 | 47.7 | 57.5 |
Fig. 7Qualitative comparison of our AMFT tracking method and VITAL [59], GFSDCF [60], MDNet [61], TADT [58], SRDC4F [38] tracking methods on some TIR target tracking test video sequences(from top to bottom are dog-D-002, street-S-001, bus-S-004, person-V-007, and leopard-H-001)
Fig. 8Failure cases (from top to bottom are campus2, stranger3, and saturated). The proposed AMFT tracking results shows in red boxes and the target ground truth shows in green boxes