| Literature DB >> 29438421 |
Amirhossein Aghamohammadi1, Mei Choo Ang1, Elankovan A Sundararajan2, Kok Weng Ng3, Marzieh Mogharrebi1, Seyed Yashar Banihashem4.
Abstract
Visual tracking in aerial videos is a challenging task in computer vision and remote sensing technologies due to appearance variation difficulties. Appearance variations are caused by camera and target motion, low resolution noisy images, scale changes, and pose variations. Various approaches have been proposed to deal with appearance variation difficulties in aerial videos, and amongst these methods, the spatiotemporal saliency detection approach reported promising results in the context of moving target detection. However, it is not accurate for moving target detection when visual tracking is performed under appearance variations. In this study, a visual tracking method is proposed based on spatiotemporal saliency and discriminative online learning methods to deal with appearance variations difficulties. Temporal saliency is used to represent moving target regions, and it was extracted based on the frame difference with Sauvola local adaptive thresholding algorithms. The spatial saliency is used to represent the target appearance details in candidate moving regions. SLIC superpixel segmentation, color, and moment features can be used to compute feature uniqueness and spatial compactness of saliency measurements to detect spatial saliency. It is a time consuming process, which prompted the development of a parallel algorithm to optimize and distribute the saliency detection processes that are loaded into the multi-processors. Spatiotemporal saliency is then obtained by combining the temporal and spatial saliencies to represent moving targets. Finally, a discriminative online learning algorithm was applied to generate a sample model based on spatiotemporal saliency. This sample model is then incrementally updated to detect the target in appearance variation conditions. Experiments conducted on the VIVID dataset demonstrated that the proposed visual tracking method is effective and is computationally efficient compared to state-of-the-art methods.Entities:
Mesh:
Year: 2018 PMID: 29438421 PMCID: PMC5811006 DOI: 10.1371/journal.pone.0192246
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Review of some related methods.
| Methods | Visual representation | Model Update | Advantages/disadvantages |
|---|---|---|---|
| Zhang et al., [ | Mean shift color segmentation, Dense Optical flow estimation, affine transformation calculation to represent large segments, pixel-based Subordinate degree calculation for segment representation. | Multiple background model estimation, updating model by merging similar background models. | The proposed method is able to detect the moving targets in complicated conditions, moving camera and by multi-model background estimation. However, Optical flow-based visual representation are high computational cost. Low processing speed (4s per frame.).The proposed method is for fixed target size and is not able to detect targets with different size. |
| Xianbin et al., [ | Kanade–Lucas–Tomasi (KLT) features for ego motion estimation, Using motion consistence, background Kanade–Lucas–Tomasi features are separated, and a target is represented. Incorporation of camera ego motion and particle filter to represent the target position. | Ego camera motion model is constructed based on background features. In order to update the model HSV color histogram and Hu moment are utilized. | The proposed method is able to track the targets in airborne videos when the camera and target are moving. |
| Aeschliman et al., [ | SURF-based feature Segmenting the target from the background. | Spatial distribution of the corresponding pairs in the images with background modeling | The proposed method is able to construct an accurate background model to target tracking when both camera and targets are moving. It is able to track the targets when appearance variations caused by shadows and lighting changes. However, prior parameters setting for camera calibration are required. Manually initialization of target representation is required in the tracking process. |
| Shen et al., [ | Multi cue spatial-color sub-regional distribution. Histogram-based (color) contrast. Spatiotemporal saliency target representation. | No background or target appearance modeling. | The propose method is fast and able to detect the moving target when the camera and target are both moving. |
| Yu et al., [ | Optical flow, Tensor Voting | Background modeling | The proposed method is able to detect the moving targets efficiently in noisy background and long-term occlusions. |
| Lan et al., [ | Kanade-Lucas-Tomasi (KLT) feature, Relative distance change (RDC) measure to represent the target in background scene that is based on a classification of matched feature pairs | No background or target appearance modeling. | The proposed method is fast and accurate in moving object detection in airborne Video. Relative distance change (RDC) measure is proposed to distinguish the target from background scene, which is invariant to image rotation, translation, and scaling. However, There is no melding of background and target, and it is not efficient in complicated conditions such as cluttered-background, occlusion and illumination.High false alarm rate in appearance scenarios. |
Fig 1Our proposed framework.
Fig 2Visual comparison for thresholding algorithms.
Fig 3Candidates mask generation.
Fig 4Parallel algorithm for candidate mask segmentation.
Fig 5Segmented sub-regions using SLIC.
(a) A candidate mask (CM) region, (b) Sub-region generation based on proposed parallel SLIC segmentation algorithm.
Fig 6Labelling of positive instances in a bag and negative ones for a particular target.
Details of VIVID data set.
| Video | Number of frames | Image size |
|---|---|---|
| EgTest01 | 1821 | 640 * 480 |
| EgTest02 | 1301 | 640 * 480 |
| EgTest03 | 2571 | 640 * 480 |
| EgTest04 | 1833 | 640 * 480 |
| EgTest05 | 1764 | 640 * 480 |
Fig 7The moving target segmentation for aerial images, first row is original image, second row is the frame difference technique, and the third row is our proposed segmentation method.
Fig 8Visual comparison for moving target region segmentation for saliency-based methods and our proposed.
Fig 9Visual comparison for moving target detection methods.
The first row is original images, the second row is frame difference method and third row is our proposed method.
Proposed method evaluation based on precision- recall and F1-measure metrics.
| Video | Precision | Recall | F1-measure |
|---|---|---|---|
| EgTest01 | 96.73 | 98.85 | 97.78 |
| EgTest02 | 66.00 | 84.97 | 74.29 |
| EgTest03 | 80.68 | 84.94 | 82.76 |
| EgTest04 | 83.91 | 89.32 | 86.53 |
| EgTest05 | 68.11 | 82.62 | 74.67 |
Quantitative comparison of visual tracking methods and our proposed method based on F1-measure.
| Video | Variance Ratio | Color-based probabilistic | Wang et al. | Liang et al. | Shen et al. | Yin et al., | Lan et al., | AMS | WLMC | NFWL | CMS | The proposed method |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EgTest01 | 68.32 | 65.03 | 72.53 | 76.78 | 96.30 | 93.06 | 91.87 | 84.12 | 68.47 | 63.85 | 84.72 | |
| EgTest02 | 56.67 | 65.24 | 53.30 | 60.81 | 73.31 | 61.40 | 47.63 | 76.18 | 62.82 | 59.85 | 78.14 | |
| EgTest03 | 77.16 | 65.08 | 85.84 | 77.39 | 71.12 | 60.68 | 48.39 | 71.78 | 61.14 | 58.51 | 72.63 | 82.76 |
| EgTest04 | 84.61 | 59.71 | 83.52 | 81.40 | 65.31 | 52.29 | 70.08 | 68.62 | 53.04 | 52.73 | 74.54 | |
| EgTest05 | 82.01 | 71.15 | 83.87 | 80.56 | 50.13 | 75.62 | 71.96 | 63.96 | 58.80 | 56.75 | 70.08 | 74.67 |
Fig 10Illustration of quantitative comparison for visual tracking methods and ours.
Fig 11Youden’s J values for each EgTest videos and their sections.