| Literature DB >> 26506359 |
Baoxian Wang1, Linbo Tang2,3, Jinglin Yang4, Baojun Zhao5,6, Shuigen Wang7.
Abstract
The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker.Entities:
Keywords: accelerated proximal gradient; extreme learning machine; manifold learning; sparse representation; visual tracking
Year: 2015 PMID: 26506359 PMCID: PMC4634458 DOI: 10.3390/s151026877
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flowchart of the proposed tracking algorithm.
The captions of the video sequences for the tracker evaluation.
| No. | Image Sequence | No. of Frames | Challenging Factors |
|---|---|---|---|
| 1 | Car4 | 659 | illumination variation, background noise, scale change |
| 2 | Car11 | 393 | illumination variation, background clutter, scale change |
| 3 | Singer1 | 321 | illumination variation, scale change |
| 4 | Caviar1 | 382 | partial occlusion, scale change |
| 5 | Caviar2 | 500 | partial occlusion, scale change |
| 6 | Occlusion1 | 898 | partial occlusion |
| 7 | Occlusion2 | 819 | partial occlusion, object rotation |
| 8 | Mhyang | 1490 | illumination variation, pose variation |
| 9 | Girl | 500 | pose variation, partial occlusion, scale change |
| 10 | DavidIndoor | 462 | illumination variation, pose variation, object rotation |
| 11 | Dudek | 1145 | partial occlusion, pose variation, scale change |
| 12 | Dog1 | 1350 | large scale change, object rotation |
| 13 | CarScale | 252 | background clutter, scale change |
| 14 | boy | 602 | abrupt motion, motion blur |
Performance comparisons in terms of center location error (CLE) (in pixels) and VOC overlap rate (VOR) (%). The best results are shown in bold and red fonts. MIL, multiple instance tracker; CT, compressive tracker; Frag, fragments-based tracker; APG, accelerated proximal gradient.
| Sequence | CT | MIL | K-SVM | Frag | L1 | L1APG | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CLE | VOR | CLE | VOR | CLE | VOR | CLE | VOR | CLE | VOR | CLE | VOR | CLE | VOR | |
| Car4 | 63.5 | 0.21 | 60.1 | 0.34 | 17.8 | 0.48 | 179 | 0.22 | 4.11 | 0.84 | 6.63 | 0.82 | ||
| Car11 | 16.7 | 0.44 | 43.5 | 0.17 | 17.2 | 0.43 | 63.9 | 0.09 | 33.3 | 0.43 | 1.89 | 0.79 | ||
| Singer1 | 16.1 | 0.34 | 15.2 | 0.34 | 31.6 | 0.38 | 22.1 | 0.34 | 4.57 | 0.71 | 4.43 | 0.69 | ||
| Caviar1 | 16.7 | 0.52 | 48.2 | 0.26 | 3.79 | 0.69 | 5.53 | 0.68 | 119 | 0.28 | 56.4 | 0.29 | ||
| Caviar2 | 66.3 | 0.29 | 69.8 | 0.25 | 5.62 | 0.71 | 5.64 | 0.56 | 3.34 | 0.79 | 62.1 | 0.35 | ||
| Occlusion1 | 16.7 | 0.75 | 32.3 | 0.59 | 9.63 | 0.81 | 6.51 | 0.87 | 10.7 | 0.76 | 5.95 | 0.85 | ||
| Occlusion2 | 17.3 | 0.58 | 14.1 | 0.61 | 7.94 | 0.65 | 15.5 | 0.61 | 11.1 | 0.67 | 10.4 | 0.72 | ||
| Mhyang | 13.3 | 0.61 | 20.3 | 0.51 | 5.12 | 12.5 | 0.65 | 3.45 | 0.77 | 5.27 | 0.72 | 0.79 | ||
| Girl | 18.8 | 0.31 | 13.7 | 0.41 | 0.71 | 20.6 | 0.46 | 3.34 | 0.68 | 3.54 | 0.66 | 3.26 | ||
| DavidIndoor | 15.7 | 0.45 | 18.3 | 0.43 | 6.25 | 0.54 | 82.1 | 0.18 | 18.9 | 0.45 | 21.5 | 0.36 | ||
| Dudek | 26.6 | 0.65 | 17.8 | 28.9 | 0.66 | 82.1 | 0.54 | 23.2 | 0.68 | 64.1 | 0.52 | 0.69 | ||
| Dog1 | 6.96 | 0.54 | 7.82 | 0.54 | 6.26 | 0.66 | 12.1 | 0.55 | 3.75 | 0.66 | 3.41 | 0.74 | ||
| CarScale | 26.1 | 0.44 | 33.1 | 0.42 | 44.9 | 0.49 | 19.5 | 0.44 | 81.5 | 0.44 | 68.5 | 0.53 | ||
| boy | 9.03 | 0.51 | 12.8 | 0.51 | 2.64 | 0.72 | 40.5 | 0.39 | 6.98 | 0.74 | 7.21 | 0.77 | ||
| Average | 23.6 | 0.47 | 29.1 | 0.44 | 13.6 | 0.62 | 40.5 | 0.47 | 23.1 | 0.65 | 23.3 | 0.62 | ||
Speeds and implementations of the tracking methods. Fps, frames per second.
| Tracker | CT | MIL | K-SVM | Frag | L1 | L1APG | Proposed |
|---|---|---|---|---|---|---|---|
| Average Fps | 68 | 34 | 2 | 4 | 0.6 | 2.5 | 7 |
| Software | MATLAB + C | C | C | C | MATLAB + C | MATLAB + C | MATLAB |
Figure 2Representative tracking results on fourteen challenging sequences : (a) Car4 and Car11; (b) Singer1 and DavidIndoor; (c) Mhyang and boy; (d) Dudek and Girl; (e) Caviar1 and Caviar2; (f) Occlusion1 and Occlusion2; (g) Dog1 and CarScale.
The descriptions of the key parameters.
| Chapter Content | Name List |
|---|---|
| 3.1 ELM Classifier Training | ELM hidden nodes number: |
| Regularization parameter: | |
| 3.2 Reducing Candidate Samples | Proportion parameter: |
| 3.3 Constrained Sparse Representation | Regularization parameters: |
| 3.4 Template Updating Scheme | Updating threshold parameter: |
Figure 3The effects of parameters: (a) L, (b) μ, (c) ζ, (d) λ, (e) γ, (f) τ.
Figure 4Some representative tracking results on three video sequences: Singer1, boy and Dudek (from left to right).
Comparisons between ELM and SVM/LS-SVM in terms of VOR (%) and Fps. CSR, constrained sparse representation.
| Sequence | ELM + CSR | SVM + CSR | LS-SVM + CSR |
|---|---|---|---|
| Car4 | 0.78 | 0.76 | |
| Car11 | 0.76 | 0.74 | |
| Singer1 | 0.65 | 0.69 | |
| Caviar1 | 0.82 | 0.81 | |
| Caviar2 | 0.75 | 0.73 | |
| Occlusion1 | 0.56 | 0.51 | |
| Occlusion2 | 0.57 | 0.53 | |
| Mhyang | 0.71 | 0.73 | |
| Girl | 0.54 | 0.42 | |
| DavidIndoor | 0.56 | 0.59 | |
| Dudek | 0.63 | 0.61 | |
| Dog1 | 0.74 | 0.76 | |
| CarScale | 0.75 | 0.71 | |
| boy | 0.76 | 0.73 | |
| Average | 0.68 | 0.67 | |
| Fps | 1 | 6 |
Self-comparisons in terms of VOR (%) and frames per second (Fps).
| Sequence | ELM + CSR | CSR | ELM |
|---|---|---|---|
| Car4 | 0.89 | 0.76 | 0.56 |
| Car11 | 0.85 | 0.69 | 0.71 |
| Singer1 | 0.83 | 0.61 | 0.54 |
| Caviar1 | 0.82 | 0.54 | 0.69 |
| Caviar2 | 0.85 | 0.66 | 0.72 |
| Occlusion1 | 0.85 | 0.71 | 0.69 |
| Occlusion2 | 0.82 | 0.62 | 0.61 |
| Mhyang | 0.79 | 0.69 | 0.75 |
| Girl | 0.71 | 0.57 | 0.63 |
| DavidIndoor | 0.61 | 0.58 | 0.55 |
| Dudek | 0.69 | 0.57 | 0.66 |
| Dog1 | 0.83 | 0.71 | 0.73 |
| CarScale | 0.79 | 0.59 | 0.69 |
| boy | 0.84 | 0.72 | 0.75 |
| Average | 0.80 | 0.64 | 0.66 |
| Fps | 7 | 2 | 10 |
Figure 5Some tracking results for self-comparisons on three video sequences: Car4, CarScale, Girl (from left to right).
Figure 6The performance score for each tracker is shown in the legend. (a) The precision plots of CLE, and the score is the precision at the threshold of 20 pixels; (b) the success plots of VOR, and the score is the AUC value. The second square bracket indicates the corresponding Fps of each tracker.