| Literature DB >> 28077512 |
Niklas Wilming1,2,3,4,5, Tim C Kietzmann1,6, Megan Jutras2,3,5, Cheng Xue7, Stefan Treue7,8,9, Elizabeth A Buffalo2,3,5, Peter König1,4.
Abstract
Oculomotor selection exerts a fundamental impact on our experience of the environment. To better understand the underlying principles, researchers typically rely on behavioral data from humans, and electrophysiological recordings in macaque monkeys. This approach rests on the assumption that the same selection processes are at play in both species. To test this assumption, we compared the viewing behavior of 106 humans and 11 macaques in an unconstrained free-viewing task. Our data-driven clustering analyses revealed distinct human and macaque clusters, indicating species-specific selection strategies. Yet, cross-species predictions were found to be above chance, indicating some level of shared behavior. Analyses relying on computational models of visual saliency indicate that such cross-species commonalities in free viewing are largely due to similar low-level selection mechanisms, with only a small contribution by shared higher level selection mechanisms and with consistent viewing behavior of monkeys being a subset of the consistent viewing behavior of humans.Entities:
Keywords: human macaque comparison; low-level salience; oculomotor control; overt visual attention
Mesh:
Year: 2017 PMID: 28077512 PMCID: PMC5942390 DOI: 10.1093/cercor/bhw399
Source DB: PubMed Journal: Cereb Cortex ISSN: 1047-3211 Impact factor: 5.357
Figure 1.Study overview. (A) Nine example images from the categories natural scenes, urban scenes, and fractal scenes. (B) One example stimulus with one monkey (blue) and one human (red) eye-movement trace. The next 3 plots show the density of human fixations on the example image, the density of monkey fixations, and a predicted saliency map for the example stimulus. (C) The computation of AUC values. Left: Feature values at fixated locations (red) and non-fixated control locations (black) are classified as fixated or not fixated by a simple threshold (green dotted line). Moving the threshold and plotting the false alarm rate (FPR) against the hit rate (TPR) generates a receiver operating characteristic (ROC) curve which is shown on the right. The area under this curve (AUC) is a measure of classification quality. (D) Different predictors and comparisons in this study.
Figure 2.Similarity of viewing behavior between all observers for different categories of stimuli. (A) Schematic of a 4 × 4 similarity matrix to visualize who predicts who in the full matrix below, for example, H->M depicts areas where human observers predict monkey observers. (B) Full similarity matrix constructed from AUC values between pairs of observers (left = natural scenes, center = urban scenes, right = fractals). The intensity of individual points encode how well an observer predicts another. The species is encoded by different colors on the side of each matrix (purple = humans; red = monkeys). Rows and columns are sorted according to the results of hierarchical clustering. The dendrogram on the top shows the cluster structure, links are colored according to their species composition (purple only humans, red only monkeys, black mixed). Monkeys and humans are sorted into different clusters by the hierarchical clustering algorithm.
Figure 3.Predicting monkey and human fixation locations with different predictors. Columns show results for natural, urban, and fractal scenes (from left to right). The bar plots on the top show mean value and 95% CIs for individual predictors. CIs are computed by repeatedly sampling 11 observers to allow better comparison between human and monkey data. Three scatter plots at the bottom show individual comparisons. Red dots show individual monkeys (monkeys are indexed by hue). Green contours show a density estimate of the distribution of human AUC; each shade increases the contained amount of observers by 10%. Bottom right plots in each panel show within-human estimates when the number of predicting observers is subsampled. Dashed blue lines indicate our adjusted estimate for the within-human consistency had only 11 observers participated.
Summary of AUC scores and percentages relative to the within-species consistency of different predictors
| Category | Within | Cross | Salience | Salience + cross | ||||
|---|---|---|---|---|---|---|---|---|
| Human | Monkey | Human | Monkey | Human | Monkey | Human | Monkey | |
| Natural | 0.658 | 0.598 | 0.579 | 0.551 | 0.591 | 0.565 | 0.60 | 0.574 |
| 50% | 51% | 57% | 67% | 66% | 75% | |||
| Urban | 0.767 | 0.680 | 0.666 | 0.626 | 0.669 | 0.644 | 0.69 | 0.653 |
| 62% | 70% | 63% | 80% | 69% | 85% | |||
| Fractal | 0.727 | 0.664 | 0.672 | 0.614 | 0.646 | 0.640 | 0.67 | 0.649 |
| 76% | 70% | 64% | 85% | 76% | 91% | |||
Figure 4.Comparison of bottom-up salience across species. Panels show feature-fixation AUC values for humans and monkeys (see Materials and Methods for feature definitions). Red lines show the best fit linear model that explains monkey feature-fixation AUCs based on human feature-fixation AUCs. The shaded blue area contains 95% of all bootstrapped regression lines created by repeatedly subsampling human and monkey observers with replacement. Small insets show the bootstrapped distribution of slopes of the linear regression. The area between the red bars contains 95% of bootstrapped slopes. Columns show different categories (naturals, urbans, and fractals).
Figure 5.Combination of different models to predict monkey fixation locations. (A) Cartoon drawing to exemplify how 2 models form a combined model. Model A and B both project data into a space in which samples are classified as fixated or non-fixated based on their position w.r.t a threshold (dashed lines). The combined model weights and sums both models to potentially improve its prediction. The slope of the resulting decision function (green line) depicts the relative impact of the 2 combined models. (B) Performance of the combined model on different stimulus categories (naturals, urbans, fractals). Color encodes the log likelihood of observing fixations with a specific combination of predictor values in the data set (e.g. log(P(fix | cross prediction, within salience)/P(control | cross prediction, within salience)) in the top row and log(P(fix | cross-salience, within salience)/P(control | cross-salience, within salience)) in the bottom row), that is, redish colors imply more actual fixations and blueish colors non-fixated locations. Top row: Each panel shows the seperating decision function of a within-species bottom-up model and the cross-species prediction (vertical and horizontal dashed lines, respectively). Diagonal green lines show the decision function of the model that combines within-species saliency and the cross-species prediction. The bottom row shows decision functions for within-species salience and cross-species salience models. Here, the decision function (green line) for a model that combines within-species and cross-species salience falls onto the function for the within-species salience model, that is, the combined model utilizes exclusively the species-specific saliency model, neglecting the respective model derived from the other species.