| Literature DB >> 34960574 |
Khizer Mehmood1, Ahmad Ali2, Abdul Jalil1, Baber Khan1, Khalid Mehmood Cheema3, Maria Murad1, Ahmad H Milyani4.
Abstract
Visual object tracking (VOT) is a vital part of various domains of computer vision applications such as surveillance, unmanned aerial vehicles (UAV), and medical diagnostics. In recent years, substantial improvement has been made to solve various challenges of VOT techniques such as change of scale, occlusions, motion blur, and illumination variations. This paper proposes a tracking algorithm in a spatiotemporal context (STC) framework. To overcome the limitations of STC based on scale variation, a max-pooling-based scale scheme is incorporated by maximizing over posterior probability. To avert target model from drift, an efficient mechanism is proposed for occlusion handling. Occlusion is detected from average peak to correlation energy (APCE)-based mechanism of response map between consecutive frames. On successful occlusion detection, a fractional-gain Kalman filter is incorporated for handling the occlusion. An additional extension to the model includes APCE criteria to adapt the target model in motion blur and other factors. Extensive evaluation indicates that the proposed algorithm achieves significant results against various tracking methods.Entities:
Keywords: APCE; fractional-gain Kalman filter; image processing; object tracking
Mesh:
Year: 2021 PMID: 34960574 PMCID: PMC8706150 DOI: 10.3390/s21248481
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Challenging scenarios in visual object tracking (VOT). The first row shows motion blur in an image sequence. The second row shows the scale variation of the target. The third row shows heavy occlusion of the target. Pictures in the figure are part of OTB-100 dataset [26].
Figure 2The spatial relation between object and its context. Picture in the figure is part of OTB-100 dataset [26].
Figure 3Flowchart of proposed tracking method.
Figure 4Occlusion detection mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Figure 5Learning rate mechanism. Pictures in the figure are part of OTB-100 dataset [26].
Distance precision rate.
| Sequence | Proposed | Modified KCF [ | STC [ | MACF [ | MOSSECA [ | KCF_MTSA [ |
|---|---|---|---|---|---|---|
| Blurcar1 | 0.978 | 0.858 | 0.024 | 0.698 | 0.999 | 0.999 |
| Blurcar3 | 0.896 | 0.829 | 0.406 | 1 | 1 | 1 |
| Blurcar4 | 0.876 | 0.987 | 0.113 | 0.944 | 1 | 1 |
| Boy | 0.973 | 0.64 | 0.761 | 1 | 1 | 1 |
| Car2 | 0.988 | 1 | 1 | 1 | 0.993 | 1 |
| Dancer2 | 0.993 | 1 | 1 | 1 | 1 | 1 |
| Human7 | 0.904 | 0.76 | 0.332 | 0.636 | 0.824 | 0.448 |
| Jogging1 | 0.973 | 0.993 | 0.228 | 0.231 | 0.231 | 0.964 |
| Jogging2 | 0.866 | 0.945 | 0.186 | 0.166 | 1 | 0.189 |
| Suv | 0.778 | 0.978 | 0.805 | 0.978 | 0.976 | 0.98 |
|
|
|
|
|
|
|
|
Average center location error.
| Sequence | Proposed | Modified KCF [ | STC [ | MACF [ | MOSSECA [ | KCF_MTSA [ |
|---|---|---|---|---|---|---|
| Blurcar1 | 4.86 | 16.05 | 1.31 × 106 | 85.16 | 6.34 | 6.01 |
| Blurcar3 | 9.12 | 14.46 | 71.37 | 3.69 | 2.98 | 3.7 |
| Blurcar4 | 15.01 | 11.19 | 2.61 × 103 | 8.04 | 10.15 | 7.15 |
| Boy | 8.09 | 50.34 | 27.4 | 2.65 | 2.31 | 2.91 |
| Car2 | 2.68 | 3.96 | 12.43 | 1.55 | 5.39 | 2.13 |
| Dancer2 | 6.82 | 6.41 | 15.3 | 6.48 | 5.8 | 6.68 |
| Human7 | 7.59 | 16.74 | 42.98 | 19.62 | 12.14 | 36.63 |
| Jogging1 | 8.39 | 3.72 | 5010 | 94.97 | 115.98 | 4.27 |
| Jogging2 | 14.2 | 4.74 | 104.02 | 147.77 | 3.47 | 136.4 |
| Suv | 15.36 | 3.65 | 48 | 3.34 | 3.73 | 3.71 |
|
|
|
|
|
|
|
|
Figure 6Precision plot comparison for the OTB-100 dataset [26].
Figure 7Center location error (in pixels) comparison for the OTB-100 dataset [26].
Frames per second (fps).
| Sequence | Proposed | Modified KCF [ | STC [ | MACF [ | MOSSECA [ | KCF_MTSA [ |
|---|---|---|---|---|---|---|
| Blurcar1 | 10.78 | 66.29 | 27.75 | 18.5 | 53.06 | 15.35 |
| Blurcar3 | 18.04 | 33.62 | 28.87 | 32.7 | 51.74 | 6.08 |
| Blurcar4 | 5.7 | 21.42 | 20.07 | 8.64 | 27.65 | 5.83 |
| Boy | 26.67 | 85.51 | 33.48 | 58.7 | 157.17 | 22.02 |
| Car2 | 57.18 | 90.7 | 94.08 | 55.3 | 95.38 | 11.2 |
| Dancer2 | 29.66 | 29.65 | 65.1 | 29.2 | 38.87 | 6.26 |
| Human7 | 25.17 | 34.44 | 59.66 | 40.5 | 26.11 | 11.48 |
| Jogging1 | 42.71 | 95.45 | 61.75 | 49 | 36.59 | 12.55 |
| Jogging2 | 22.77 | 33.01 | 56.92 | 34.6 | 33.97 | 11 |
| Suv | 69.61 | 76.32 | 98.03 | 50.9 | 79.7 | 8.44 |
Computation time of the proposed tracker’s learning rate module.
| Sequence | Frame Size | Number of Frames | Time |
|---|---|---|---|
| Blurcar1 | 640 × 480 | 742 | 0.011 |
| Blurcar3 | 640 × 480 | 357 | 0.008 |
| Blurcar4 | 640 × 480 | 380 | 0.009 |
| Boy | 640 × 480 | 602 | 0.009 |
| Car2 | 320 × 240 | 913 | 0.018 |
| Dancer2 | 320 × 262 | 150 | 0.006 |
| Human7 | 320 × 240 | 250 | 0.007 |
| Jogging1 | 352 × 288 | 307 | 0.012 |
| Jogging2 | 352 × 288 | 307 | 0.008 |
| Suv | 320 × 240 | 945 | 0.017 |
Figure 8Qualitative comparison for the OTB-100 dataset [26].