Literature DB >> 23801946

Heading recovery from optic flow: comparing performance of humans and computational models.

Andrew J Foulkes¹, Simon K Rushton, Paul A Warren.

Abstract

Human observers can perceive their direction of heading with a precision of about a degree. Several computational models of the processes underpinning the perception of heading have been proposed. In the present study we set out to assess which of four candidate models best captured human performance; the four models we selected reflected key differences in terms of approach and methods to modelling optic flow processing to recover movement parameters. We first generated a performance profile for human observers by measuring how performance changed as we systematically manipulated both the quantity (number of dots in the stimulus per frame) and quality (amount of 2D directional noise) of the flow field information. We then generated comparable performance profiles for the four candidate models. Models varied markedly in terms of both their performance and similarity to human data. To formally assess the match between the models and human performance we regressed the output of each of the four models against human performance data. We were able to rule out two models that produced very different performance profiles to human observers. The remaining two shared some similarities with human performance profiles in terms of the magnitude and pattern of thresholds. However none of the models tested could capture all aspect of the human data.

Entities: CellLine Chemical Disease Gene Species

Keywords: computational models; heading; optic flow

Year: 2013 PMID： 23801946 PMCID： PMC3689323 DOI： 10.3389/fnbeh.2013.00053

Source DB: PubMed Journal: Front Behav Neurosci ISSN： 1662-5153 Impact factor: 3.558

Introduction

Optic flow is the pattern of optical motion available at the eye during relative movement between the observer and the scene. It is known that primates are sensitive to the stereotypical patterns of optic flow which arise when an observer moves through a largely stationary scene (Figure 1). This sensitivity has been demonstrated using psychophysics (e.g., Warren and Hannon, 1988; Snowden and Milne, 1997), neurophysiology (e.g., Duffy and Wurtz, 1991; Wurtz, 1998) and neural imaging (e.g., Morrone et al., 2000; Smith et al., 2006). See Lappe et al. (1999) for a review.

Figure 1

Schematic optic flow fields available at the eye when locomoting towards doorways. (A) Observer locomoting towards left “no entry” doorway. (B) Observer locomoting towards right “entrance” doorway. In both panels direction of heading (instantaneous direction of locomotion) indicated by the centre (“focus of expansion”) of the radial flow field. Several roles for the neural processing of optic flow have been put forward. For example, Lee and Aronson (1974) found that movement of an artificial room around a stationary observer could cause infants to sway, and in extreme cases fall over, providing strong evidence that optic flow plays a role in the control of posture. An additional role proposed suggests that that optic flow drives rapid eye movements which act to stabilize the foveal image and maintain correspondence between images on the two retinae (e.g., see Busettini et al., 1997). Consequently, this optic flow-driven stabilization process helps to preserve foveal visual acuity and stereo vision during observer movement (Angelaki and Hess, 2005). More recently, in the flow parsing hypothesis, it has been suggested that optic flow processing plays an important role in the assessment of scene-relative object movement during self movement (Rushton and Warren, 2005; Rushton et al., 2007; Warren and Rushton, 2007, 2008, 2009a,b; Matsumiya and Ando, 2009; Pauwels et al., 2010; Warren et al., 2012). Although many roles for optic flow processing have been suggested, the vast majority of work in the literature to date has focused on a single potential role in the guidance of locomotion. Following Grindley (see Mollon, 1997) and contemporaneously with Calvert (1950), Gibson (1950) championed the idea that the primary role of optic flow is used in the guidance of locomotion (however see Llewellyn, 1971; Rushton et al., 1998 for alternative hypotheses). During forward movement of the observer a radial pattern of optic motion is available at the eye. The centre of the radial pattern, termed the Focus of Expansion (FoE) coincides with the direction of locomotion of the observer relative to the environment. For example, if the FoE is coincident with a doorway then an observer regulating direction of locomotion to maintain the FoE in the same location will reach the doorway (see Figure 1). In a seminal publication by W. Warren (Warren and Hannon, 1988) it was demonstrated that stationary observers can judge simulated direction of locomotion with a precision of a degree or two based only on optic flow information consistent with self movement. A number of models of the computations underlying heading estimation (or more generally methods for estimating eye or camera movement parameters from sequences of static images) have been proposed (e.g., Longuet-Higgins and Prazdny, 1980; Heeger and Jepson, 1992; Perrone, 1992; Lappe and Rauschecker, 1993; Royden, 1997; Beardsley and Vaina, 1998; Beintema and van den Berg, 1998; Fitzgibbon and Zisserman, 1998; Grossberg et al., 1999; Wang and Cutting, 1999; Davison, 2003). These models vary markedly in terms of approach and aims. For example a set of models (primarily published in the computational vision literature) aim to estimate parameters of camera movement as accurately as possible from the flow field (or, equivalently, sequences of static images). In these cases a novel mathematical treatment of the information available in the flow is commonly the primary inspiration for the model (e.g., Longuet-Higgins and Prazdny, 1980; Heeger and Jepson, 1992; Fitzgibbon and Zisserman, 1998; Davison, 2003). For several other models (most commonly published in the human vision or biological literature), the aim is to model human performance in heading estimation, including circumstances in which performance is poor. These models are often more clearly inspired by known properties of the neural substrates thought to underpin heading estimation (e.g., Perrone, 1992; Beintema and van den Berg, 1998). Further models have, at least partially, reconciled computer vision models with what is known about human motion processing (e.g., Royden, 1997 for Longuet-Higgins and Prazdny, 1980; Lappe and Rauschecker, 1993 for Heeger and Jepson, 1992). We might also add a further class of model which uses a connectionist approach (e.g., Beardsley and Vaina, 1998; Grossberg et al., 1999); this class is clearly related to the categories above in that it seeks to reflect neurophysiological constraints, however it is distinct in that the known neurophysiological constraints emerge as properties of the connectionist network rather than being used at the outset to underpin the architecture of the model. Lastly we note that although the majority of models take the instantaneous monocular 2D vector flow field generated from combinations of rotations and translations of the eye, it has been shown that humans can take advantage of stereo depth information (van den Berg and Brenner, 1994; Rushton et al., 1999) and some models take stereoscopic flow fields as their input (e.g., Wang and Duncan, 1996). For a thorough treatment of different classes of model for heading estimation see Lappe (2000), and see Raudies and Neumann (2012), for a recent review and evaluation. Here we investigate how well candidate models of heading perception capture human performance. We first assess how the performance of human observers varies as a function of the quantity (i.e., number of flow dots per frame) and quality (i.e., amount of 2D directional noise in the flow field) of motion information in the flow field. This provides a reference performance profile. We then select four models that span the approaches to modelling heading estimation and compare their performance to that of human observers; we generate similar performance profiles for our candidate models and compare them to the reference human profile. The quality and quantity manipulations are motivated by the fact that models of heading recovery differ markedly in their treatment of the motion information in the flow field. While some implement inherently local operations, such as motion vector differencing (Longuet-Higgins and Prazdny, 1980), others involve global processing such as template matching flow over the entire field of view (Perrone, 1992). Consequently, manipulating the quantity and quality of the information in the flow field should have differential effects on the models. In the first section we assess how human performance is affected by the quality and quantity of information in the flow field. In the second section we present the same stimuli to four heading models. The performance profiles of the four models are then compared to human performance.

Experiment—human observers

Methods

Participants

Nineteen participants (6 male, 13 female) took part in the experiment. All were recruited from the School of Psychological Sciences, University of Manchester. Two of the participants were authors (AJF & PAW), and the other 17 (2 Postdoctoral Research Associates, 5 Ph.D. students, and 10 undergraduate students) were naïve. All participants were recruited and tested in line with the Declaration of Helsinki. The study was approved by the ethics committee in the School of Psychological Sciences, University of Manchester.

Apparatus

Participants sat in a dark room with their chin on a chinrest such that their line of sight was directed at the centre of a CRT display (Viewsonic pf255) running at 100 Hz with a resolution of 1280 × 1024 and subtending a visual angle of approximately 40° × 32°. The display was approximately 57 cm from the observer's eyes. The visible part of the display casing was obscured with irregular shaped black card, which minimized stray light reflecting on the casing. The stimuli were generated under Windows 7 on a Dell Optiplex 780 with an nVidia GeForce 9600GT using x16 anti-aliasing. The experiments were written using Lazarus, a free open source development system for Pascal (http://www.lazarus.freepascal.org/) together with the JediSDL libraries (http://pascalgamedevelopment.com/jedi-sdl/) which facilitate use of OpenGL in Pascal.

Stimuli and task

Each trial consisted of three parts. The first was the fixation phase, in which the target line (a 1° vertical red line placed at one of four target locations) alone was presented for 1 s and participants were instructed to fixate it. In the second part, the flow phase, the target line turned into a small green annulus with outer radius of 0.1° and inner radius of 0.05°. While the observers maintained fixation the flow field was presented on the screen for 2 s. Flow fields were comprised of limited lifetime (250 ms) red dots, each of 0.1° radius. The dots moved in an expanding radial pattern consistent with forwards translation of the observer at around 1 m/s through a cloud of dots at distances randomly assigned in a range between 0.5 and 4.5 m from the observer (see Figure 2A).

Figure 2

Schematic illustration of radial optic flow fields presented in the experiments. In both panels, there are 200 dots, with the target/fixation point located at 2° to the left of the centre of the field of view. In (A), the FoE is located 4° to the right of the target position and there is no noise present. In (B), the FoE is located 0.2° to the left of the target position and there is directional noise present (15° s.d.—Noise Level 2). The third and final part of the trial was the response phase during which observers were once again presented with the red target line (same as in the fixation phase) in isolation, which remained onscreen until the observer made a response. The observer's task was to indicate whether they were passing to the left or right of the target line.

Choice of flow fields

We made a decision to only test performance for simple radial flow fields. This decision was made for several reasons. First, the candidate models were only designed for the case of forward translation of the observer. Second, it makes logical sense to start with the most basic optic flow stimulus. If model performance is markedly different in the simplest case then there is no need to consider it any further as a candidate. We did not consider the effects of adding simulated gaze rotation during forward translation. Although, historically there has been much interest in such flow fields because of the debate regarding the role of extra-retinal information in heading recovery (e.g., Warren and Hannon, 1988; Royden et al., 1992), today these issues have been largely resolved (e.g., see Li and Warren, 2002).

Design

The experimental design was based on that of Warren et al. (1988). We manipulated two variables, the quantity (number of dots—5, 50, 100, 200 dots per frame) and quality [dot noise—a random independent additive perturbation of the 2D direction of each flow field dot by sampling from a zero-mean Gaussian distribution with standard deviations set at either 0° (no noise), or 7.5° (noise level 1), or 15° (noise level 2)] (see Figure 2B). Two other variables were systematically varied, these were target location (±2°, ±4°) and FoE offset relative to the target location (±0.2°, ±0.5°, ±1°, ±2°, ±4°). It should be noted that given the limited lifetime of our dots (250 ms) and an estimate of visual persistence of around 100 ms (see Di Lollo, 1980) we expect that the observers perceive approximately 40% more points than are present on any single frame. Full factorial combination of the variables resulted in 480 individual conditions and observers saw each condition twice over two 30 min experimental sessions. For the data analysis, similar to Warren et al. (1988) we collapsed the data over the positive and negative FoE offsets and then over target locations, giving 16 repetitions for each of 5 (FoE offsets) × 4 (number of dots) × 3 (dot noise) = 60 conditions.

Analysis

Analysis of the data was based on that in Warren et al. (1988). For each of the 12 (number of dots × dot noise) flow field conditions, participant responses were converted to % correct scores and cumulative Gaussian psychometric functions were fitted to these scores as a function of the FoE offset. Thresholds were defined as the FoE offset for which participants performed at the 75% correct level. In order to minimize the impact of individual differences between participants, thresholds were normalized within individual. The normalization procedure involved first calculating the average threshold for each observer over the 12 flow field conditions (the grand mean), and then dividing each threshold by the grand mean. When averaging over observer data we also took into account the quality of the psychometric function fit to each participant's heading response data. Specifically, the mean normalized threshold for each of the twelve conditions was calculated as a weighted linear sum of the individual participant thresholds. Weights were defined based on the inverse of the RMS error in the psychometric function fit, normalized to sum to 1 (see Appendix for details). Finally we converted the normalized data back to an angular measure by multiplying it by the grand mean of thresholds calculated over all participants.

Results—humans

Weighted thresholds for the twelve flow field conditions are shown in the Figure 3A. The magnitude of the thresholds is consistent with those found in previous studies (e.g., Warren et al., 1988).

Figure 3

(A) Thresholds and (B) associated standard deviations for the 12 flow field conditions. Error bars represent ±1 s.e. Squares represent no noise conditions, circles for noise level 1 and diamonds for noise level 2. Considering the number of dots per frame manipulation first, we see that performance is markedly worse below 50 dots per frame but that above this level performance is relatively constant. This finding was supported by statistical analyses in the form of multiple weighted t-tests (Bland and Kerry, 1998) conducted within each noise condition revealing highly significant differences between the thresholds in the 5 and 50 dots per frame conditions for all three noise conditions but no differences between thresholds in the other dots per frame conditions (see Table 1).

Table 1

Results of weighted .

	No noise	Noise level 1	Noise level 2
(5,50)	0.0040^*	0.0081^*	0.0024^*
(50,100)	0.3132	0.6844	0.5065
(100,200)	0.7453	0.4718	0.7274

For each pair of sequential number of dots per frame levels, p-values are reported at the noise level given in the column heading. Asterisks indicate that the fit was significant.

Results of weighted . For each pair of sequential number of dots per frame levels, p-values are reported at the noise level given in the column heading. Asterisks indicate that the fit was significant. Turning to the noise manipulation we see that there is a clear effect of noise such that increasing the noise level leads to increased thresholds. Beyond the 50 dots condition the effect of moving from the no noise condition to noise level 2 is an approximate doubling in threshold (from around 0.5° to around 1.0° on average). For the 5 dots condition, the effect of noise is much smaller, with little or no change in the threshold measured. These observations are supported by weighted t-tests (Bland and Kerry, 1998) conducted within number of dots per frame conditions revealing significant differences between the no noise and noise level 2 conditions in all dots conditions except for when there were 5 dots per frame (Table 2).

Table 2

Results of weighted .

	5	50	100	200
NN, NL1	0.3135	0.0019^*	0.0230^*	0.3481
NN, NL2	0.4004	<0.0001^*	0.0034^*	0.0090^*
NL1, NL2	0.9110	0.0414^*	<0.0001^*	0.0402^*

For each pair of noise levels, p-values are reported at the number of dots per frame level given in the column heading. Asterisks indicate that the fit was significant.

Results of weighted . For each pair of noise levels, p-values are reported at the number of dots per frame level given in the column heading. Asterisks indicate that the fit was significant. For later comparison with the heading models we also plot the within subject weighted standard deviations of the human data as a function of the manipulations in Figure 3B. Primarily, the human s.d.s follow the pattern of the thresholds, increasing when there is either more noise or there are fewer dots in the field. Because the most significant change in performance occurs between 5 and 50 dots we conducted a follow up experiment to explore this range in more detail. The second experiment was identical to the first except that the number of dots per frame levels were set as (5, 15, 25, 35, 50). The results for this experiment with 20 additional participants (three of whom also took part in the first experiment) are shown in Figure 4A. Note that the data presented in the 5 and 50 dots per frame conditions were obtained by collating data from the 36 different participants taking part in these two conditions in both the primary and follow up experiments). In Figure 4B we re-plot the data from the primary experiment with the data in Figure 4A inserted (note again that the data at 5 and 50 dots per frame are similar to those in Figure 3A, i.e., they have been calculated over the 36 participants who saw these two conditions over the two experiments). Again, for later comparison with the models, in Figures 4C,D we also show the within subjects weighted standard deviations of the human data as a function of the flow quality and quantity manipulations.

Figure 4

(A) Thresholds plots for the follow up experiment for the dot range 5–50. (B) combined data, (C) standard deviation for the 5–50 data, and (D) standard deviation for the combined data. In all cases, the squares represent no noise, circles noise level 1, and diamonds noise level 2. Error bars represent ±1 standard error. The data in Figure 4A suggest that, similar to the data in the primary experiment the thresholds decrease and stabilize as the quantity of flow information increases and that the stabilization begins to occur somewhere in the 15–35 dots per frame region. It is worth noting that in Figure 4B there appears to be a further drop in thresholds between 50 and 100 dots. This contradicts the pattern of data found in the primary experiment in which performance had stabilized by 50 dots per frame. We think that this is at least partially due to a mismatch in thresholds between the preliminary and follow up experiments in the 50 dots per frame condition such that those in the follow up experiment were rather larger than in the main experiment. Consequently the drop in thresholds observed in this range may simply be a consequence of collecting the data in two separate experiments rather than reflecting some residual improvement in performance in that range.

Discussion—human data

By manipulating the quantity (number of dots per frame) and quality (amount of directional noise) of information in the optic flow field we generated a performance profile for human observers. From the performance profile we note the following features for evaluating the four candidate heading models. As the number of dots per frame is increased a critical range is reached [between 15 and 35 dots per frame—see Warren et al. (1988) who report similar threshold values]. Before reaching that range heading thresholds exhibit a marked dependence on the quantity of flow in the field such that as the quantity is reduced the thresholds increase rapidly. After that range as the quantity is increased thresholds remain relatively stable. Last, before reaching the critical range there is little evidence of a relationship between heading performance and the amount of directional noise in the flow field whereas afterwards there is a marked relationship between heading performance and directional noise (as noise increases so do the thresholds). In the following sections we describe a number of heading models and assess their compatibility with these features.

Heading models

We begin with a brief description of the models tested. In the interests of brevity the descriptions presented here are deliberately concise, further details can be found in the Appendix together with simulations which demonstrate our implementations reproduce the behavior of the original models.

Candidate models

We have chosen four heading models which were first described in: (1) Longuet-Higgins and Prazdny (1980); (2) Perrone (1992); (3) Heeger and Jepson (1992); and (4) Wang and Cutting (1999). The models will be referred to as LHP80, P92, HJ92, WC99, respectively. In all cases the input to the model was a 2D optic flow field.

Longuet-Higgins and Prazdny (1980) (LHP80)

The motivation for LHP80 was primarily computational, i.e., it is based on a mathematical treatment of the available information in order to recover camera movement parameters as accurately as possible. Royden (1997) proposed a biological implementation of this algorithm using properties of MT-like cells. This model is designed to recover both the translational and rotational velocities of the observer (camera) as it moves through the scene, LHP80 searches for the FoE, (referred to as the vanishing point in LHP80). In order to recover the FoE, the model takes advantage of the fact that for two points in the optic flow field at the same location in the image plane but at different depths in the scene, the rotational components of the flow field are identical but the translational components differ. The model applies a local differencing operation on the velocities of pairs of points in the optic flow field at the same location in the image plane but at different depths in the scene. As a consequence, the FoE can be recovered relatively simply using basic geometry provided sufficiently many pairs (2) of coincident points in the image plane can be identified. Once the FoE has been located, the rest of the unknowns, i.e., the 3 translational and (if present) 3 rotational velocity components can be determined. See the Appendix for more details. One pertinent issue for LHP80 is that it is assumed that there are at least 2 pairs of points which are co-incident in the image plane but at different depths in the scene. As a consequence this model might be expected to be sensitive to changes in the quantity of information in the flow field, i.e., it will make large errors in heading estimation when there are no (or not sufficiently many) co-incident points. Given the nature of the stimuli in the present experiment (random dot flow fields) it is possible that there will be no co-incident points present in the stimulus. Consequently, we implement a modified LHP80 which after a pair-wise comparison of points in the field selects those which are closest together in the image plane for use in the calculation of heading. A further issue for this model is that the analysis undertaken relies fundamentally on a local analysis of the velocity information (since it involves a local differencing operation). Consequently, it is possible that this model will suffer when noise is added to the flow field since it does not benefit from an integrative approach which can improve robustness to noise. However, the model could be improved by weighting and combining information across many pairs of points. Although LHP80 was not tested explicitly in Longuet-Higgins and Prazdny (1980), in the Appendix we demonstrate how accurate this model can be given appropriate velocity information.

Perrone (1992) (P92)

This model is based directly upon extensively studied (e.g., Zeki, 1980; Van Essen et al., 1981; Maunsell and Van Essen, 1983; Albright, 1984, 1989) properties of cells in MT and consequently it is biologically-inspired. The fundamental units of the model are local (i.e., relatively small receptive field) speed and direction tuned cells in MT. From these basic units more complex global (i.e., large receptive field) motion processing sensors (or templates) are created by pooling activity from appropriately tuned local units. Crucially, different combinations of local units at different locations in the visual field can be constructed to generate sensors tuned to different heading directions. It is suggested that these sensors reflect the properties of cells in MST in being tuned for global optic flow commensurate with observer translation. This is essentially a template-matching model with the global motion sensor (or template) which is most active taken to signal the current observer heading. We initially used templates tuned for heading directions over a 40 × 40° grid separated by 1° in the vertical and horizontal extents. This was done to match the notional resolution of other models implemented (see below). In Perrone (1992), several tests of model performance were carried out. In the Appendix we replicate one of these tests.

Heeger and Jepson (1992) (HJ92)

Similar to LHP80, Heeger and Jepson (1992) present a computational treatment of the problem of recovering camera movement (and scene structure) from optic flow information (although they do discuss how the algorithm might be implemented neurally). More, recently Lappe and Rauschecker (1993) have suggested that such a mechanism might be implemented in human MST. HJ92 uses what the authors refer to as a subspace method which involves re-expression of the equations of optic flow as a product between two matrices. The first matrix is dependent on the unknown translation component and a second matrix contains the unknown rotation and depth components. The authors then present a method for recovering a least-squares estimate of the translation component. This method involves the definition of a residual function which measures the extent to which a candidate translation is consistent with the flow vectors in the image. The candidate for which the consistency is maximized is taken as the best estimate of translation and can be used to subsequently estimate the rotation and depth components of the scene. A more thorough description including the mathematical formulation is presented in Appendix B. Heeger and Jepson (1992) tested the robustness of their model to noise and variations in the amount of motion information in the display. In the Appendix, we show that we can faithfully replicating those results. We also note that in the testing of their model the resolution for candidate headings was approximately 1°. We used a similar 1° resolution in our implementation.

Wang and Cutting (1999) (WC99)

This model stands out from the others presented thus far in that it is probabilistic in nature, i.e., it returns a distribution on possible heading directions defining the probability that each heading direction generated the flow field. Probabilistic models of perception are very common in the broader literature [e.g., for a review see Rao et al. (2002)] and consequently we felt it was important to include an example. The success of the model is based on a theorem which states that for any given heading direction if a pair of image points are converging then the heading direction cannot be between those two points (Wang and Cutting, 1999). Using this theorem the authors present a Bayesian analysis of heading estimation which looks at the angular velocities of pairs of points in the image to assess whether they are converging or diverging. By considering many pairs of points it is possible to build up a posterior distribution on the heading direction. The method used in this model is very simple from a computational point of view. However Wang and Cutting (1999) demonstrate that it performs relatively well assuming the flow field is not too sparse. As with the other candidate models, in the Appendix we present the results of simulations for comparison to sample results published in Wang and Cutting (1999). We also note that in the testing of their model, Wang and Cutting split the visual field into columns with a width of 1°, and we use this resolution in our implementation of the model.

Further information on heading models

Above we briefly describe four candidate models. Details of the implementations can be found in the Appendix. We also refer the interested reader to Lappe (2000) and a more recent and very thorough paper by Raudies and Neumann (2012) comparing the performance of 13 computational models of heading recovery (including variants of those described here). Raudies and Neumann (2012) rigorously characterize and evaluate model performance for a range of manipulations. Similarly to our study, the authors tested model sensitivity to noise and density manipulations. However, they also considered (among other things) simulated ego-motion type, noise model, robustness to statistical bias and depth range. The experimental methods for the model observers were very similar to those for human observers. However, some minor modifications were undertaken to simplify the implementation. First, the target location was fixed at the centre of the image such that FoE of the flow field was always offset to the side of the central target location. In the human judgment experiments we varied the position of the FoE to minimize any effects associated with repetitive presentation of the same stimuli. Such a manipulation is clearly unnecessary for the models. Second, since the model cannot “see” the stimulus for 2 s we attempted to achieve parity between flow information provided in trials for human and model observers. It has been estimated that heading judgments require approximately 270 ms (van den Berg, 1992) of optic flow. Over a 2 s trial this would equate to approximately seven independent heading estimates. Consequently on a single trial for the simulated observers, the model returned the average heading estimate obtained over seven independent estimates, each based on two frame (separated by 270 ms) flow-fields. Third, we needed to take into account the temporal integration window that characterizes human perception. As noted in the methods above, it is estimated that motion perception involves integrating information over 100 ms (Di Lollo, 1980). Given the limited lifetime of our dots (250 ms) we estimate human observers had access to 40% more dots at any given moment. Consequently in each number of dots per frame condition we provided the heading models with 40% extra dots. Finally, in order to get an accurate estimate of each model's performance profile we ran the models in each condition until the standard error of the mean for that condition dropped below a threshold value of 0.2°. In addition, we were also keen to investigate whether the profile of variability in the different conditions matched that of humans. Consequently we calculated the standard deviation over 1000 trials for each condition for comparison with human observers' precision data.

Analysis

The analyses undertaken for the model observers are similar to those for humans. We took the data for each model, fitted a psychometric curve (as described in the section above), and calculated the 75% threshold. The weighted (based on the goodness of fit of the psychometric function) means of these thresholds were then used to compare overall model performance.

Results—models

In Figure 5 we show the raw threshold data for the different models. Clearly the models perform differently in response to the manipulations of flow quality and quantity.

Figure 5

Raw data for the model observers. Squares are for no noise, circles for noise level and 1 and diamonds noise level 2. Top row shows the thresholds recovered in each condition. Simulations terminated when sufficient replications had been conducted to drive the standard error of the mean below 0.2°. Bottom row shows the variability (SD) in the model estimates over 1000 repetitions. Considering the thresholds first (top row in Figure 5), we see that all models exhibit sensitivity to noise—as noise is increased the thresholds tend to increase. However, WC99 appears to be rather more robust than any other model to the noise in the flow field. In addition, all models except for the LHP80 model in the noisy conditions and the P92 and HJ92 models in the no noise condition, exhibit a characteristic pattern of dependence on the number of dots per frame such that as the quantity of flow increases the thresholds decrease up to a point at which the performance reaches a ceiling. In contrast, for LHP80 although thresholds do appear to depend on the quantity of flow when the stimulus contains no noise however, this dependence is markedly reduced when noise is added to the stimulus. We note also that WC99 appears to differ from the other models in the sense that performance continues to improve beyond the 100 dots per frame level. In contrast, for all other models the performance is stable by around 50 dots per frame (or below this point in the no noise condition). Turning to the variability data (bottom row of Figure 5) we see that the standard deviations appear to largely follow the patterns of the thresholds. However this is not the case for the HJ92 model for which variability is independent from the flow quantity manipulation for all levels of the flow quality factor.

Human vs. models comparison

A preliminary comparison of the model and human data suggests that none of the models provides a perfect fit to the data. In some conditions human observers outperform the models (e.g., in the noise level 2 conditions). In other conditions certain models outperform the humans (e.g., P92 and HJ92 in the no noise conditions). We now discuss similarities and differences between models and the human data for each model in turn.

P92

The pattern of data obtained from this model exhibits some qualitative similarities to that obtained from human observers. Thresholds show sensitivity to the flow quality and flow quantity manipulations. In addition, the magnitudes of the P92 simulated thresholds appear closer than any other model to those seen in human observers. However, P92 thresholds appear to be too low when there is no noise but there are few dots in the field suggesting that this model can cope considerably better than humans in sparse flow fields when there is no noise.

WC99

This model also demonstrates some qualitative similarities to the human data. In particular it shows sensitivity to both the quantity and quality manipulations. However, for this model, thresholds are considerably higher than those for humans. This is particularly evident in the lowest flow quantity conditions but even at 50 dots per frame thresholds are higher than those seen for humans. In addition, whilst the performance of human observers appears stabilize as the flow quantity increases (and based on the data in Figure 3A we suggest that stabilization occurs at around 15–35 dots per frame), WC99 thresholds still improve at 200 dots per frame relative to 100 dots per frame.

HJ92

The pattern of thresholds exhibited by HJ92 is similar to those of P92 but considerably higher when noise is present in the flow field. Consequently although a similar dependence on flow quality and quantity is observed, human performance is rather more robust to noisy flow fields. In contrast HJ92 performs better than humans when there is no noise present and, similar to P92, does not show a large increase in threshold when the flow field is noise-free but sparse.

LHP80

Of the four models tested LHP80 appears to fit the human data least well. First, thresholds recovered by this model are considerably higher than those from humans. In addition, although there is some dependence on flow quantity when there is no noise present, unlike humans LHP80 still improves as the quantity of flow increases from 50 to 100 dots per frame. In addition, it appears that the model cannot tolerate the addition of noise to the flow field. In comparison human observers appear relatively robust to this manipulation.

Regression analyses

In order to compare the performance profile of each model to the human data more formally, we regressed each of the model thresholds, t, and standard deviations, s, against the human data (t, s) using simple linear models of the form: We were particularly interested in the quality of the fits (as measured by the R2 statistic). We can think of this statistic as telling us about whether the human and model data have a similar pattern or shape. For a model which provided a good approximation to the human data we would expect fits to both thresholds and standard deviations to produce high R2 values. We were also interested in the value of the slope parameter β for the threshold fits. This parameter reflects the gain of the model with respect to the human data and should be close to 1 if model and human thresholds are similar in magnitude. Primarily we were interested in the fits to all the data obtained across both experiments (data in Figures 4B,D—referred to as “All” data) but for completeness we also present the fits to the separate data obtained in the primary experiment (data in Figure 3—referred to as “5–200” data) and the follow up experiment (data in Figures 4A,C—referred to as “5–50” data). The results of the regression analyses are shown in Tables 3 and 4. Table 3 shows the R2 statistic obtained for each model when regressing both thresholds and standard deviations to the human data. Table 4 show the associated values of the gain parameter β.

Table 3

	Threshold			Standard deviation
	5–50	5–200	All	5–50	5–200	All
P92	0.239	0.357^*	0.315^*	0.418^*	0.323	0.427^*
WC99	0.779^*	0.827^*	0.814^*	0.363^*	0.748^*	0.419^*
HJ92	0.212	0.320	0.274^*	0.143	0.025	0.086
LHP80	0.561^*	0.447^*	0.506^*	0.119	0.019	0.061

Asterisks indicate that the fit was significant.

Table 4

Gain parameters (β) for each model for both the thresholds and the standard deviations.

	Threshold			Standard deviation
	5–50	5–200	All	5–50	5–200	All
P92	0.240	0.489^*	0.352^*	0.134^*	0.136	0.140^*
WC99	0.259^*	0.283^*	0.264^*	0.095^*	0.076^*	0.063^*
HJ92	0.097	0.188	0.139^*	0.034	0.014	0.026
LHP80	0.334^*	0.238^*	0.250^*	0.071	0.019	0.040

Asterisks indicate that the fit was significant.

. Asterisks indicate that the fit was significant. Gain parameters (β) for each model for both the thresholds and the standard deviations. Asterisks indicate that the fit was significant. First we note that with respect to standard deviations only P92 and WC99 produce significant fits to the data. When all the data in both experiments are considered, the R2 value for WC99 and P92 are comparable, interestingly WC99 is rather better than P92 for the 5–200 data in isolation. Turning to the thresholds we see that WC99 stands out as providing by far the best fit to the data (in terms of variance explained) in both experiments. However, fits for all other models are also significant. Although it does not provide the best fit to the human data, the P92 model has the highest gain (closest to 1) indicating that thresholds for this model are closest in magnitude to those of human observers. It should also be noted that the R2 value for P92 is low primarily due to the inability of this model to capture human performance when the flow field contains fewer dots. Overall this analysis suggests that although WC99 provides the best fit to the data and P92 produces thresholds closest in magnitude to the human data, no model can simulate the human data satisfactorily. As noted above three models (P92, WC99, HJ92) have an implicit free parameter representing a form of model resolution and this was set to 1° in all three cases. In the case of P92 this parameter controls the resolution of the array of template directions. In the case of WC99 it controls the width of the columns spanning the space of optic flow. For HJ92 there is a parameter which controls the difference between the sampling resolution of the candidate headings. It is possible that the thresholds obtained are critically dependent on the resolution parameter for each model which would impact upon the fits to human data. As a consequence we re-ran all the models (except LHP80) and adjusted the resolution parameter to be 0.5° rather than 1°. The new model simulations are shown in Figure 6. We then repeated the regression analyses and the results are shown in Tables 5 and 6.

Figure 6

Table 5

	Threshold			Standard deviation
	5–50	5–200	All	5–50	5–200	All
P92	0.278^*	0.420^*	0.343^*	0.328^*	0.316	0.303^*
WC99	0.775^*	0.825^*	0.801^*	0.435^*	0.872^*	0.522^*
HJ92	0.210	0.321	0.272^*	0.146	0.042	0.092
LHP80	0.561^*	0.447^*	0.506^*	0.119	0.019	0.061

Asterisks indicate that the fitting was significant.

Table 6

Gain parameters (β) for each model for both the thresholds and the standard deviations when the models were rerun with a resolution of 0.5°.

	Threshold			Standard deviation
	5–50	5–200	All	5–50	5–200	All
P92	0.733^*	1.362^*	1.077^*	0.303^*	0.293	0.306^*
WC99	0.251^*	0.285^*	0.280^*	0.076^*	0.084^*	0.075^*
HJ92	0.099	0.191	0.142^*	0.039	0.021	0.030
LHP80	0.334^*	0.238^*	0.250^*	0.071	0.019	0.040

Asterisks indicate that the fitting was significant.

Raw data for the model observers with a resolution of 0.5°, where appropriate. Squares are for no noise, circles for noise level 1 and diamonds noise level 2. Top row shows the thresholds recovered in each condition. Simulations terminated when sufficient replications had been conducted to drive the standard error of the mean below 0.2°. Bottom row shows the variability (SD) in the model estimates over 1000 repetitions. . Asterisks indicate that the fitting was significant. Gain parameters (β) for each model for both the thresholds and the standard deviations when the models were rerun with a resolution of 0.5°. Asterisks indicate that the fitting was significant. The R2 values for the three models which were susceptible to resolution effects have remained relatively unchanged suggesting that although this free parameter affects thresholds it cannot change the quality of the fit. WC99 still stands out as providing the best fit to the human data. However, by changing the resolution the impact on the gain parameter is more marked for certain models. In particular, we note that the gains for P92 are now considerably closer to 1 than any other model (they have remained relatively stable for WC99 and HJ90). As a consequence we suggest that it may be possible to tune the performance of certain models (e.g., P92) to produce similar magnitude thresholds to human observers by manipulating the resolution parameter.

Discussion—model data

In summary the simulations conducted and subsequent analyses suggest that HJ92 and LHP80 are least consistent with human data. In contrast WC99 and P92 do show some similarities to the human data although in different respects. Whilst WC99 provides a better fit to the data overall it appears to exhibit rather higher thresholds than humans. Conversely P92 fits the data less well (primarily due to its inconsistency with human data below 50 dots per frame) but can produce thresholds which are closer in magnitude to those of human observers. Ultimately no model tested provides a good approximation to the thresholds exhibited by humans.

Improving the models

One anonymous reviewer pointed out that human optic flow processing might be affected by factors such as non-isotropic deployment of attention over the flow field and/or a drop off with increasing eccentricity of the precision of coding motion. Another anonymous reviewer suggested that performance might be improved by whitening the input optic flow (to minimize statistical bias – see e.g., Raudies and Neumann, 2012). The authors of the models we tested did not incorporate such features in their models. We don't know whether that was because they did not consider these issues, whether they did consider them and they judged them irrelevant, or whether they did not consider them relevant to the immediate aim of describing a novel model. In response to the reviewers' comments we decided that we would extend and retest the models in the cases where we felt there was a persuasive biological or empirical basis. Consequently, we conducted further simulations to address eccentricity/precision and the deployment of attention by adding eccentricity dependent noise and changing the spatial configuration of optic flow respectively (these manipulations are explained in Appendix D). We decided not to implement an unbiasing operation because, although there is some suggestion that unbiasing might occur via non-linear response properties of visual neurons (Lyu and Simoncelli, 2009), at least in the case of optic flow processing, there is evidence that human observers do not correct for statistical biases. For example, Hogervorst and Eagle (1998) examined optic flow processing and the recovery of structure from motion and elegantly demonstrated that humans exhibit systematic errors in the estimation of scene structure estimation from optic flow that are not compatible with an unbiasing process. The results of additional simulations to investigate eccentricity noise scaling and spatial re-configuration of attentional scope to focus on smaller but potentially more informative portions of the scene are described in Appendix D. To summarise neither manipulation could account for the differences between model and human performance and thus our conclusions remain unchanged.

General discussion

We have investigated the dependence of human heading estimation on the quality and quantity of information in the optic flow field. This was done to provide a reference performance profile for human observers against which we could compare models of optic flow processing to estimate heading. We found that human performance is sensitive to manipulations in both flow quality and quantity. There appears to be a critical value of flow quantity below 50 dots per frame [possibly in the range of 25–35 dots per frame (Warren et al., 1988)], such that below this point there is a substantial deterioration in performance and above this point performance is relatively stable. With respect to the flow quality we found that human performance degrades as more directional noise is added to the flow field and that this effect is enhanced when the quantity of flow is higher. Inspection of the threshold profiles suggested that the models approximated the human data with varying degrees of success. The LHP80 model did not exhibit the characteristic dependence on quantity of flow in the noisy conditions. HJ92 appeared to be less robust to noise when compared to human observers. While WC99 thresholds shared some of the characteristics of the human data, it appeared less sensitive to the noise manipulation. Finally although P92 produced the appropriate pattern of thresholds with approximately the correct magnitude it could not account for the data when the flow field was sparse. Subsequent regression analyses found that LHP80 and HJ92 were least successful in fitting the standard deviation and threshold data. Two models, WC99 and P92, were more successful, although for different reasons. WC99 captured the shape of the human performance profile, which produced the higher R2 values than P92, but, given an appropriate resolution parameter P92 could produce thresholds which were closer in magnitude to those of human observers. The fact that LHP80 and HJ92 are least consistent with human data suggests that they are unlikely to provide a good description of the human heading estimation mechanism. However, it should be noted that the motivation behind the design of these models was not to simulate human heading recovery since they represent purely mathematical treatments of the information available in the flow field and an algorithm for recovery of heading. It is interesting that it was the Bayesian model we tested (WC99) that showed most consistency with human data. This model employs a statistical analysis of the flow field based on simple heuristics about the position of the FoE. In addition, performance of the model also depends upon two other parameters (Appendix B) which are related to the probability of heading direction being between two points in the scene if those points are converging. It may be possible to further tune this model to simulate human performance more closely by adjusting these parameters. At the very least our data should prompt a re-examination of this little-cited model and the principles that underpin it. Of the two models which did provide better fits to human behaviour one is motivated by human neurophysiology. P92 was based upon the properties of speed and direction tuned cells in MT and the global motion sensors which form templates in this model are at least consistent with the idea of integrating local motion responses from MT in MST. However, as noted, P92 fails to capture the performance of human observers when the flow field is relatively sparse; in particular it performs rather better than humans in such circumstances. Therefore P92 would need to be extended or modified to better capture human performance with sparse flow fields.

Conclusions

In conclusion we suggest that of the four models tested two stand out as most consistent with human data. One of these models is based on a heuristic analysis of the flow field combined with a Bayesian combination of available evidence for heading direction. The other is based upon properties of cells in neural motion processing regions. We note however, that none of the models considered is able to fully capture human heading performance in response to manipulation of the quantity and quality of information in the flow field.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

44 in total

1. A cortical architecture on parallel hardware for motion processing in real time.

Authors: Karl Pauwels; Norbert Krüger; Markus Lappe; Florentin Wörgötter; Marc M Van Hulle
Journal: J Vis Date: 2010-08-18 Impact factor: 2.240

2. Visual guidance of locomotion.

Authors: K R Llewellyn
Journal: J Exp Psychol Date: 1971-12

3. The response properties of cells in the middle temporal area (area MT) of owl monkey visual cortex.

Authors: S Zeki
Journal: Proc R Soc Lond B Biol Sci Date: 1980-02-29

4. Why two eyes are better than one for judgements of heading.

Authors: A V van den Berg; E Brenner
Journal: Nature Date: 1994-10-20 Impact factor: 49.962

5. Direction and orientation selectivity of neurons in visual area MT of the macaque.

Authors: T D Albright
Journal: J Neurophysiol Date: 1984-12 Impact factor: 2.714

6. The middle temporal visual area in the macaque: myeloarchitecture, connections, functional properties and topographic organization.

Authors: D C Van Essen; J H Maunsell; J L Bixby
Journal: J Comp Neurol Date: 1981-07-01 Impact factor: 3.215

7. Phantom motion after effects--evidence of detectors for the analysis of optic flow.

Authors: R J Snowden; A B Milne
Journal: Curr Biol Date: 1997-10-01 Impact factor: 10.834

8. Sensitivity to optic flow in human cortical areas MT and MST.

Authors: A T Smith; M B Wall; A L Williams; K D Singh
Journal: Eur J Neurosci Date: 2006-01 Impact factor: 3.386

9. Retinal flow is sufficient for steering during observer rotation.

Authors: Li Li; William H Warren
Journal: Psychol Sci Date: 2002-09

10. Nonlinear extraction of independent components of natural images using radial gaussianization.

Authors: Siwei Lyu; Eero P Simoncelli
Journal: Neural Comput Date: 2009-06 Impact factor: 2.026

8 in total

1. Heading perception depends on time-varying evolution of optic flow.

Authors: Charlie S Burlingham; David J Heeger
Journal: Proc Natl Acad Sci U S A Date: 2020-12-16 Impact factor: 11.205

2. Perception of object motion during self-motion: Correlated biases in judgments of heading direction and object motion.

Authors: Xing Xing; Jeffrey A Saunders
Journal: J Vis Date: 2022-10-04 Impact factor: 2.004