| Literature DB >> 35323870 |
Souradeep Chakraborty1,2, Dimitris Samaras1,3, Gregory J Zelinsky4,1,5.
Abstract
The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition. We hypothesize that the greater the number of object categories competing for an object proposal, the greater the uncertainty of how that object should be recognized and, hence, the greater the need for attention to resolve this uncertainty. As expected, we found that target features best predicted target-present search, with their dominance obscuring the use of other features. Unexpectedly, we found that target features were only weakly used during target-absent search. We also found that object recognition uncertainty outperformed an unsupervised saliency model in predicting free-viewing fixations, although saliency was slightly more predictive of search. We conclude that uncertainty in object recognition, a measure that is image computable and highly interpretable, is better than bottom-up saliency in predicting attention during free viewing.Entities:
Mesh:
Year: 2022 PMID: 35323870 PMCID: PMC8963662 DOI: 10.1167/jov.22.4.13
Source DB: PubMed Journal: J Vis ISSN: 1534-7362 Impact factor: 2.004
Figure 1.Examples of MaskRCNN-generated object proposal bounding boxes, shown with the corresponding uncertainty maps and the ground-truth fixation-density maps.
Figure 2.The four priority maps (center bias, saliency, uncertainty, and target) shown with the original input image (leftmost) and the ground-truth fixation-density map (rightmost) for three representative trials in free viewing (top), TP search (middle, where the target objects are a clock, car, and bowl in rows 1–3, respectively), and TA search (bottom, where the target objects are a car, bottle, and cup in rows 1–3, respectively).
Figure 3.Model predictions for the first nine new fixations showing the relative importance (z-statistic of the priority map in a GLMM analysis) of object recognition uncertainty, bottom–up saliency, target features, and center bias in free viewing (A), TP search (B), and TA search (C). Brightness codes greater contribution. Instances show the proportion of images contributing to each fixation prediction. Note that instances sum to 1 over the column and that the factor weights sum to 1 over each row.
Standard errors, z values, and p values for significance tests conducted on the different priority maps from our GLMM analysis.
| Free-viewing | ||||
|---|---|---|---|---|
| Fixation | Uncertainty | Saliency | Center Bias | |
| 1 |
|
|
| |
| 2 |
|
|
| |
| 3 |
|
|
| |
| 4 |
|
|
| |
| 5 |
|
|
| |
| 6 |
|
|
| |
| 7 |
|
|
| |
| 8 |
|
|
| |
| 9 |
|
|
| |
| Target-present search | ||||
| Fixation | Uncertainty | Saliency | Target | Center Bias |
| 1 |
|
|
|
|
| 2 |
|
|
|
|
| 3 |
|
|
|
|
| 4 |
|
|
|
|
| 5 |
|
|
|
|
| 6 |
|
|
|
|
| 7 |
|
|
|
|
| 8 |
|
|
|
|
| 9 |
|
|
|
|
| Target-absent search | ||||
| Fixation | Uncertainty | Saliency | Target | Center Bias |
| 1 |
|
|
|
|
| 2 |
|
|
|
|
| 3 |
|
|
|
|
| 4 |
|
|
|
|
| 5 |
|
|
|
|
| 6 |
|
|
|
|
| 7 |
|
|
|
|
| 8 |
|
|
|
|
| 9 | SE = 0.144, | SE = 0.144, | SE = 0.228, | SE = 0.140, |
Figure 4.NSS prediction accuracy as a function of fixation number for the label-entropy, pixel-wise, and object recognition uncertainty models.
Figure 5.Predictions from three object recognition uncertainty models (middle three columns) and ground-truth fixation-density maps (right) superimposed over three representative images (left).
|
|
|---|
|
|