Literature DB >> 35003612

Form and Function in Information for Visual Perception.

Abstract

Visual perception involves spatially and temporally coordinated variations in diverse physical systems: environmental surfaces and symbols, optical images, electro-chemical activity in neural networks, muscles, and bodily movements-each with a distinctly different material structure and energy. The fundamental problem in the theory of perception is to characterize the information that enables both perceptual awareness and real-time dynamic coordination of these diverse physical systems. Gibson's psychophysical and ecological conception of this problem differed from that of mainstream science both then and now. The present article aims to incorporate Gibson's ideas within a general conception of information for visual perception. We emphasize the essential role of spatiotemporal form, in contrast with symbolic information. We consider contemporary understanding of surface structure, optical images, and optic flow. Finally, we consider recent evidence about capacity limitations on the rate of visual perception and implications for the ecology of vision.

Entities: Chemical

Keywords: 3D perception; attention; capacity/resolution; motion; optic flow; shape; spatial vision; spatiotemporal factors; stereopsis; temporal processing

Year: 2021 PMID： 35003612 PMCID： PMC8728782 DOI： 10.1177/20416695211053352

Source DB: PubMed Journal: Iperception ISSN： 2041-6695

The basic concern of psychophysics is the nature of information for perception. Gibson’s (1950) theory of visual perception was based on psychophysics:Stevens (1951) offered a related assessment: The first place to look is obviously the retinal image. If, contrary to past teaching, there are exact concomitant variations in the image for the important features of the visual world a psychophysical theory will be possible. (p. 61) In a sense there is only one problem of psychophysics, namely, the definition of the stimulus. In this same sense there is only one problem in all of psychology — and it is the same problem. (p.31) When Gibson and Stevens made these statements, scientific psychology was still often considered a study of ‘responses’ to ‘stimuli’. A stimulus was the cause, and a response was the effect. Neither term, however, was objectively defined or consistently used. Gibson (1960) pointed out that the word ‘stimulus’ had so many different meanings that its explanatory value was obscure. He concluded his review of ambiguities of the ‘stimulus’ concept with a proposal that “We need to know the laws of stimulus information” (p. 702). ‘Stimulus’ and ‘information’ entail different properties, however. Neither Gibson nor Stevens had a good alternative to the term ‘stimulus’ for the input to perception and cognition. And related problems with terminology persist! How indeed can we understand causal relations between such distinct physical domains as the distal environment, proximal sensory stimulation, electro-chemical changes in dense neural networks, conscious experience, concepts, and actions? How do changing environmental conditions affect behavioral actions in real time? The traditional intuitive concepts of cause and effect seem insufficient. Whereas the term ‘stimulus’ suggests passive processes of sensation, perception, and response, ‘information’ suggests selective observation and purposive actions. Both terms, however, have vague usage. Both have been generic labels for the apparent antecedents of widely varied behaviors. ‘Information’ has referred to the input for perceptual processes from detection to visual-motor coordination and for cognitive processes from memory to problem solving. ‘Information’ might refer to a discrete object or to a spatiotemporal pattern of energy, in the distal environment or at the proximal senses. Because the concept of information is essential to theories of perception, and because the term has varied meanings, a review of basic principles is useful. Two general issues include: (1) What is information—as a general concept, and for particular instances of perception and action? (2) How much information can be acquired, as a general limiting parameter of perception and performance? Gibson's perceptual research focused on the psychophysics of perceptual information, with sparse attention to the limitations of perception. Mainstream psychology, however, has often focused on the limitations of perception, though often without a clear or consistent definition of information. The following sections consider basic aspects of the concept of information.

Information Involves Dyadic Relations. It is not an Objective Thing

If the input to perception is information, then it is natural to think of it as something objective and quantifiable, independent of an observer. In technology, information is often treated as specific signals sent from a source to a destination. And observers might be said to perceive environmental information. Natural as such ideas may seem, they are misleading. A basic but counterintuitive property of information is that it involves a dyadic relationship between two separate subsystems—a correlation of corresponding variations in physically separate systems. Shannon’s (1948) mathematical theory of communication is often treated as the principal foundation for “information theory”, but it was not a theory of information as such. Rather, as Shannon emphasized, it was a theory of communication, of a correlation between a sender and a receiver. In that general sense, information involves a dyadic relation, a correlation, with corresponding variations in two or more separate domains. Treating information as an objective property or thing in any single domain causes confusions. Gibson’s (1979) ecological approach emphasized a similar idea about relationships inherent in the concepts of environment, perception, and action: “. . . the words animal and environment make an inseparable pair. Each term implies the other” (p. 8). “The environment of animals and men is what they perceive. The environment is not the same as the physical world, . . .” (p. 15). Eleanor Gibson developed a similar idea in her book on Perceptual learning and development (1969): “. . . the information in the structure of stimulation is potential; it is not necessarily picked up” (p. 14). Thus, we may identify environmental patterns that can guide an observer's perceptions and actions, but these are potential information. Various patterns may or may not be used or even noticed by a particular observer in guiding actions for particular purposes. The same scenario offers different information for different observers, depending on the perspectives afforded by their locations, purposes, interests, and knowledge of context. Accordingly, Gibson consistently emphasized that the perceived environment of any given observer is inherently meaningful. Chapter three of The Ecological Approach (1979), on “The meaningful environment,” begins with the following point: If what we perceived were the entities of physics and mathematics, meanings would have to be imposed on them. But if what we perceive are the entities of environmental science, their meanings can be discovered (p. 33). In contrast, meaning is supposedly irrelevant in Shannon's mathematical theory. His theory gives a general method for quantifying a correlation between a source and destination, abstracted from the specific forms of variation and from the particular functions and meanings of information. An implicit property of Shannon's system, however, is that the possible signals are known beforehand by both sender and receiver. Thus, meaning is inherent in Shannon's system before information is transmitted. In Shannon's system, information can be quantified by reduced uncertainty. A general concept of information, however, should not rest on anticipation and quantifiable uncertainty. In contrast, in Gibson's psychophysical and ecological approach, spatiotemporal images convey information by structural correspondences with environmental objects and events. For Gibson, the structure is the meaning. Another form of information is encountered in the information-processing paradigm in psychology, usually consisting of discrete objects, features, or alphanumeric symbols. Experimental psychology in the second half of the 20th century was freed from the limited causal relations implicit in concepts of ‘stimuli’ and ‘responses’. Human perception and cognition were often re-envisioned by analogy with machines for processing information. Information in machines is usually represented by digital symbols, and symbolic representation is usually implicit in the information-processing paradigm. Spatiotemporal structure usually is irrelevant in that paradigm. Information is not necessarily symbolic, however. Fundamentally, information is based on corresponding variations in two physically separate domains. Psychophysical information in Gibson's ecological psychology is primarily spatiotemporal.

A Psychophysical Continuum From Spatiotemporal to Symbolic Information

Structural correspondences between proximal images and their distal environmental referents vary on a continuum from natural spatiotemporal similarity at one end to abstract human-made artifacts at the other, from optical images of environmental surfaces in the eyes of active observers to alphanumeric characters and other symbols. The contrast between the two extremes is the principal present concern, but a rich variety of artifacts lies between. Consider the many varieties of abstract paintings (impressionism, cubism, etc.), silhouettes, sketches, cartoons, and pictograms. Even realistic portraits, photographs, and movies extract distinguishing features from typical appearances and context. Frames, galleries, and theaters make such abstraction explicit. Information in general, from spatiotemporal to symbolic, is selective. So is perception.

The representational theory of measurement

The logic of how variations in one domain constitute information about those in another domain is exemplified by the representational theory of measurement (e.g., Krantz, Luce, Suppes, & Tversky, 1971; Roberts, 1979; Suppes & Zinnes, 1963). A brief discussion is relevant because the logic of numerical representation resembles that of information in general. Measurement is said to consist of two fundamental problems: the representation problem and the uniqueness problem, roughly analogous to semantics and syntax. The representation problem is to identify the qualitative relational structure of a set of empirical observations and show that this structure is isomorphic to a particular structure of numerical relations. Importantly, the isomorphism is defined on the relations rather than the objects. For example, using a balance pan to evaluate weights involves qualitative binary relations of inequality and equality and a concatenation operation for combining multiple objects in the same pan. Those qualitative relations may be represented by numerical relations of and and a summation operation, . Thus, if is the qualitative relation ‘ a equals ’, and if ‘a ⋏ b’ is the concatenation operation, then numerical representation in the real numbers, ℜ, is an isomorphism of ternary relations such as ‘a ⋏ b ’ if and only if x + y = z, for all , c ∈ A and all real numbers x, y, z ∈ ℜ. Information often involves correspondences that are not strictly isomorphic, but the logic is similar: Information requires corresponding relational structures. The correspondence is between the relationships, not individual objects or symbols. This principle applies in particular to visual perception, where spatiotemporal structures correspond across environment, optical images, neural activity patterns, etc. Measurement theory is said to involve a second fundamental problem, concerning uniqueness of the numerical assignment. This is an invariance issue: What numerical transformations preserve isomorphism with the empirical relations? Thus, a ratio scale is invariant under multiplication by a scalar constant. Ratios of distances are equivalent whether measured in meters, feet, or inches. Relations on interval scales are invariant under linear transformations of the numerical scale, and nominal-scale representations are invariant under any transformation that preserves the empirical categories. Invariance of corresponding relational structure is the foundation of all information. The relevant relational structure in any domain is defined by the transformations under which that relational structure is invariant. In comparing objects on a balance scale, for example, the relevant property is identified by the physical transformations that preserve those comparative relations. Relative weight is obviously invariant with changes in shape, volume, material substance, color, etc.; and the concept of mass is identified by invariance of the balance relations under changes in the gravitational field or acceleration. Similarly, visual information about environmental structure requires invariance of optical image relations involving surface shapes, colors, motions, etc. under changes in both the observer's relative position and motion within the environment and changes in environmental conditions such as ambient illumination and surrounding context.

Symbolic versus spatiotemporal information

By definition, the informational significance of a symbol is unrelated to its spatial and temporal form. The physical forms serve merely to distinguish between symbols. In the information-processing paradigm of 20th-century psychology, sensory information often resembled what it had been before: Sensory ‘stimuli’ and even neural ‘spikes’ could be construed as categorical symbols to be transformed or processed. Hypothesized systems for processing symbolic information were much more complex than mere ‘responses’, but the input for perception and cognition was often a categorical event. Spatiotemporal structure was usually irrelevant. In contrast, spatiotemporal form was essential to Gibson's ecological approach. From his 1950 book onward, Gibson sought to elucidate corresponding structures of environmental and sensory patterns. He pointed out that observers’ perceptions and actions coordinated in real time with the dynamic structures of their environments because spatiotemporal variations of sensory patterns corresponded to those of the environment. Texture density gradients, for example, were considered information about surface slant. Corresponding spatiotemporal structures in distal environments and proximal stimulation evidently were both available and necessary to support real-time coordination of actions with environments. Thus, the senses could be considered as perceptual systems. And one could regard environments as perceived “directly”, rather than inferred logically by combining various sensory cues and prior knowledge.

Symbols and spatiotemporal forms require different perceptual systems

In Wiener’s (1948/1961) Cybernetics, information was spatiotemporal, suitable for guiding the actions of animals and analog machines. Computer engineering, however, has been dominated by digital technology. Shannon's quantification of information by binary “bits”, 0 versus 1, was useful in engineering digital systems. The idea that human perception and cognition can be considered symbol-processing systems has appealed to many scientists. Newell and Simon’s (1972) volume on Human problem solving explicates the rationale for studying human problem solving as symbol processing. A similar rationale is implicit in other research on perception and cognition. From the standard information-processing perspective, visible patterns are often represented as configurations of discrete elements — cues, features, objects, edges, letters, etc. Physical definitions of such sensory elements are often arbitrary, based on properties convenient for experimental manipulation. Spatiotemporal relations among such symbolic elements usually are not explicit. If the forms of symbols and their referents are unrelated, then the functional significance of any particular symbol pattern necessarily originates in the observer's processes for relating current and past input patterns. Such processes are then needed to detect, encode, group, integrate, categorize, interpret, and store that information. Observers’ schemas and Bayesian statistical inference are needed to interpret the environmental correlates of the sensory data patterns. Semantics and syntax of symbol systems often depend on the community of users. In theories of human and machine vision, spatial positions and relations are sometimes described by reference to extrinsic coordinates defined by retinal anatomy or photosensor array hardware. Marr (1982), for example, said it was obvious that the spatial structure of retinal images constitutes a scalar field with retinal coordinates. This representation of optical information separates the spatiotemporal structure of images from that of the environment, making the visual system responsible for recovering the environmental geometry. As a result, visual perception is said to be an “ill-posed” problem—requiring inferences based on prior and statistical knowledge about the environmental sources of limited image information. The ill-posed problem, however, is that framed by this representation of image information. The insufficiency of representing image information as two-dimensional (2-D) scalar fields may be seen by considering the image transformations associated with active observers in dynamic environments. Optical images are continually transformed by (a) observers’ viewing positions, (b) environmental object motions, (c) ambient illumination, (d) environmental context, and (e) observers’ prior knowledge. Image information about the environmental correlates must, therefore, be invariant or at least stable under those transformations. If corresponding spatiotemporal structures of optical images and environments can be identified, then Gibson’s (1966) consideration of the senses as perceptual systems and his (1979) ecological approach are plausible, parsimonious, and fruitful. This psychophysical approach does not mean that observers’ neural or computational processes for receiving sensory information are irrelevant. Instead, these processes reveal rather than supply the coherence of sensory information. Nor does the psychophysical approach mean that learning and past experience are irrelevant. Experienced observers learn how things vary (Gibson & Gibson, 1955). Expertise enables recognition of relevance and meaning (e.g., Chase & Simon, 1973). Neural and computational systems need to preserve real-time spatiotemporal coherence of sensory input. Descriptions of neural networks and brain functions, however, often involve encoding and integration of input features and supplementary association with previous patterns. Temporal processes of cortical physiology are also described as accumulating different information at different rates in different cortical areas. How exactly such neural systems might preserve coherent forms of sensory information about dynamic environments is not yet clear. A mathematical model of cortical function recently developed by Wason (2021) suggests how brain activity might preserve the spatiotemporal coherence. Specifically, cortical physiology and columnar architecture can support dynamic information transmission by coherent phase relations among distributed neural processes. Neural networks in this model function as coherent apertures, similar to phased-array radar systems and holography. A coherent aperture maintains high mutual information between inputs and outputs and also reduces the entropy of neural activity. Perception and action might be similarly coordinated by coherent phase relations among multiple neuronal spike rates. Psychophysical evidence demonstrates that the visual system is indeed very sensitive to the coherence of spatiotemporal patterns (Lappin & Bell, 1976; Lappin & Craft, 2000; Lappin, Donnelly, & Kojima, 2001; Lappin, Norman, & Mowafy, 1991; Lappin, Tadin, & Whittier, 2002). Stereoscopic acuity in particular indicates that binocular vision must function as a coherent aperture (Lappin & Craft, 1997, 2000).

Visual Information is Spatiotemporal

Two emphases throughout Gibson's research program on visual perception were that (a) vision is an active process by moving observers, and (b) the image transformations produced by moving observers and moving objects are basic forms of information about spatial structure. Gibson's observations in 1950 about the informational significance of the geometry of changing spatial patterns in the eyes of moving observers were major insights. Previously, the idea that such geometrical transformations could be information about spatial structure was barely imaginable. For most experimental psychologists of that era, space and time were separate physical dimensions. Retinal images had two spatial dimensions. And a third dimension in depth was necessarily an inference from “cues” such as occlusion, linear perspective, and motion parallax. Motion perception and form perception seemed functionally separate. Information in general and information for visual perception in particular is based on invariance under transformations. A vast collection of experimental evidence over the past 50 years clearly illuminates the fundamental nature of spatiotemporal structure at all levels of visual perception. Rogers’ (2021) paper “Optic flow: Perceiving and acting in a 3-D world” in this special issue describes Gibson's influence on our current understanding of perception of three-dimensional (3-D) structure from motion. Landmarks in that development include Johansson's studies of relative 2-D motion perception (1950/1994) and of biological motion perception (1973), Braunstein’s (1976) monograph on Depth Perception through Motion, and Rogers and Graham’s (1979) study of motion parallax. No one who has seen those vivid demonstrations can still believe that space and time are visually independent. Indeed, vision is more sensitive to spatial relations in moving than in stationary patterns (Lappin & Craft, 2000; Lappin, Bell, Harm, & Kottas, 1975; Lappin, Donnelly, & Kojima, 2001; Lappin, Tadin, & Whittier, 2002). And our nervous systems evidently lose almost no information about image motion from the retina through the cortex (Borghuis, Tadin, Lankheet, Lappin, & van de Grind, 2019).

Surface Structure is a Basic Form of Visual Information

The dynamic structures of environments, optical images, neural networks, and conscious experience have distinctly different spatial dimensions, spatial and temporal frequencies, energies, material elements, and structural features. What variations in these diverse domains can explain their corresponding spatiotemporal structures? One aspect of that problem now seems understandable, at least in broad outline: Surface structure is a basic form of visual information. The spatiotemporal structure of 2-D images corresponds to the 3-D spatial structure of environmental surfaces. Gibson emphasized the critical role of surfaces in his (1979) ecological approach. His nine “ecological laws of surfaces” include the following (pp. 23–24): All persisting substances have surfaces, and all surfaces have a layout. Any surface has a characteristic shape, . .. Any surface has a characteristic texture, depending on the composition of the substance. An illuminated surface may absorb either much or little of the illumination falling on it. A surface has a characteristic reflectance, depending on the substance. These were insightful points, well ahead of visual science at that time. Subsequent research has supported and elaborated these ideas. We now know that information for perceiving the spatial shape, colors, and substances of the environment involves the optical image structure of surfaces. Perceived environmental surfaces are not inferred from simpler visual “cues”, but correspond directly to the spatiotemporal optical images of surfaces. Photos of common scenes and objects in Figure 2 may suggest some of the detail about the 3-D shapes and substances of environmental objects provided by even stationary images. Moving one's vantage point within such 3-D scenes adds basic information by the structural invariance of transformations over multiple images, as discussed below.

Figure 1.

Figure 2.

Five photographs of everyday objects and scenes that may illustrate the rich visible detail in even stationary image information about the local shapes and material substances of environmental surfaces. The shape at each local patch on a smooth (differentiable) surface is either (a) elliptical, convex, or concave (hills and valleys), where the minimum and maximum curvatures have the same sign; (b) hyperbolic (saddle-shaped), where the minimum and maximum curvatures have opposite signs; (c) parabolic (cylindrical), where one direction is curved and the other is not; or (d) planar, where the surface is not curved in either of the two directions. Elliptical and hyperbolic regions are separated by parabolic lines, where a principal curvature changes sign. The curvature in any given direction on a smooth surface is defined by the rate of change in direction of the surface normal relative to change in surface position — a second-order spatial derivative. Directions of minimum and maximum curvature are orthogonal. The ratio of the two principal curvatures is an intrinsic property of the surface, independent of a three-dimensional (3-D) reference frame, and defined in two-dimensional (2-D) images. The magnitude of curvature in a given direction, however, is not defined in the image. These photographs also illustrate interactive relations between perceived colors and material substances. Surface microstructure affects macroscopic image structure by the way it affects the scattering of incident light. Material substance, spatial structure, and wavelength spectrum interact. Gold and silver colors, for example, do not appear in the rainbow. Nor do the color qualities of wood, skin, rocks, metal, water, etc. (The photos in this illustration were by the first author, JSL.)

Figure 3.

A schematic illustration of the approximate isomorphism of the spatial differential structures of environmental surfaces, retinal images, and perceptual systems in the visual system, brain, and experience. These correspondences have been tested and supported in psychophysical experiments. This conceptual illustration is oversimplified in that correspondences may be noisy and selective. (From Lappin & Craft, 2000, p. 8.).

Figure 4.

A schematic illustration of the qualitative second-order differential structure of the two-dimensional (2-D) neighborhood of a given point in the inter-image displacement fields associated with stereoscopic disparity or rotation in depth for each of the four possible local shapes of a smooth surface. A: planar; B and C: parabolic (cylindrical) in two different orientations; D: elliptic (concave or convex, bumps and dimples, hills and valleys); E: hyperbolic (saddle-shaped). The central point in each pattern is a stationary reference point for describing the surrounding image displacements. (These diagrams were suggested by Steven Tschantz, Department of Mathematics, Vanderbilt University.) (From Lappin & Craft, 2000.).

A psychophysical continuum of varied forms of information, from natural, spatiotemporal structure to human-made artifacts and symbols. All information is based on corresponding relational structures of variation in physically separate domains. And the relevant relational structure of any given domain is defined by the transformations under which that relational structure remains invariant and preserves correspondence with another domain. The natural spatiotemporal structures pertinent to Gibson’s ecological approach are invariant under environmental changes in context and illumination and observational changes in the observer's vantage point and motion. Symbolic information relies on consistencies of usage in social groups. Thus, spatiotemporal structure of natural information is directly meaningful, but the meaning of symbolic information derives from the observer's socially based knowledge. This illustration is based on a diagram of Gibson (1966, p. 244) on “the difference between perceptual meaning and verbal meaning”. Five photographs of everyday objects and scenes that may illustrate the rich visible detail in even stationary image information about the local shapes and material substances of environmental surfaces. The shape at each local patch on a smooth (differentiable) surface is either (a) elliptical, convex, or concave (hills and valleys), where the minimum and maximum curvatures have the same sign; (b) hyperbolic (saddle-shaped), where the minimum and maximum curvatures have opposite signs; (c) parabolic (cylindrical), where one direction is curved and the other is not; or (d) planar, where the surface is not curved in either of the two directions. Elliptical and hyperbolic regions are separated by parabolic lines, where a principal curvature changes sign. The curvature in any given direction on a smooth surface is defined by the rate of change in direction of the surface normal relative to change in surface position — a second-order spatial derivative. Directions of minimum and maximum curvature are orthogonal. The ratio of the two principal curvatures is an intrinsic property of the surface, independent of a three-dimensional (3-D) reference frame, and defined in two-dimensional (2-D) images. The magnitude of curvature in a given direction, however, is not defined in the image. These photographs also illustrate interactive relations between perceived colors and material substances. Surface microstructure affects macroscopic image structure by the way it affects the scattering of incident light. Material substance, spatial structure, and wavelength spectrum interact. Gold and silver colors, for example, do not appear in the rainbow. Nor do the color qualities of wood, skin, rocks, metal, water, etc. (The photos in this illustration were by the first author, JSL.) A schematic illustration of the approximate isomorphism of the spatial differential structures of environmental surfaces, retinal images, and perceptual systems in the visual system, brain, and experience. These correspondences have been tested and supported in psychophysical experiments. This conceptual illustration is oversimplified in that correspondences may be noisy and selective. (From Lappin & Craft, 2000, p. 8.). A schematic illustration of the qualitative second-order differential structure of the two-dimensional (2-D) neighborhood of a given point in the inter-image displacement fields associated with stereoscopic disparity or rotation in depth for each of the four possible local shapes of a smooth surface. A: planar; B and C: parabolic (cylindrical) in two different orientations; D: elliptic (concave or convex, bumps and dimples, hills and valleys); E: hyperbolic (saddle-shaped). The central point in each pattern is a stationary reference point for describing the surrounding image displacements. (These diagrams were suggested by Steven Tschantz, Department of Mathematics, Vanderbilt University.) (From Lappin & Craft, 2000.).

Information about surface shape is intrinsically 3-D

Contemporary understanding of the geometrical structure of surfaces and their images owes much to the contributions of Koenderink and van Doorn, who clarified correspondences between the differential geometry of environmental surfaces and that of their spatiotemporal images (e.g., Koenderink, 1984a,b, 1987, 1990; Koenderink & van Doorn, 1975, 1976a, b, c, 1992a, b). Two-dimensional images of surface shape constitute direct information about the intrinsic three-dimensionality of the surface. Geometric properties that define the local shape at each visible point on an environmental surface are also available in the optical images, and these properties are invariant under the geometric transformations produced by relative movements of the object and observer in 3-D space. Surface shape is defined at every point by the second-order differential structure of 3-D spatial relations over the surface. The second-order differential structure of the image and the second-order structure of the environmental surface correspond to one another—especially when there are multiple images of the object, due to binocular vision or relative rotation of the observer and object. For such spatiotemporal images, surface shape information is given by a diffeomorphism between the environmental surface and its retinal image structure (see references above for Koenderink and Koenderink & van Doorn; also Lappin & Craft, 2000; Lappin, Norman, & Phillips, 2011). Additionally, the self-occluding boundary contours of smooth surfaces are highly visible specifications of the surface shapes at those locations (Koenderink 1984a, b). Psychophysical experiments have also shown that (a) human vision is highly sensitive to the geometrical properties that define surface shape, and (b) this visual sensitivity is robust over transformations of lower-order image properties—translation, expansion, slant, tilt—associated with relative movements of the object and observer (e.g., Lappin, 2014; Lappin & Craft, 2000; Lappin, Norman, & Phillips, 2011; Perotti, Todd, Lappin, & Phillips, 1998; Phillips & Todd, 1996; Phillips, Todd, Koenderink, & Kappers, 2003; Todd, 2004). Thus, the visual system and brain receive direct information about the 3-D shapes of surfaces from retinal images. Visual perception of 3-D surface shape is based directly on the second-order differential geometry of retinal patterns. It is not an inference from lower order elements. Nevertheless, the images do not determine the scale of objects’ extensions in depth. And the depth scale is not reliably perceived (e.g., Norman & Todd, 1993; Todd, 2004; Todd, Oomes, Koenderink, & Kappers, 2004). As Todd (2004) pointed out, however, the set of potential shapes corresponding to a given image is highly constrained. A one-to-many mapping from image to objects is consistent with precise visual information because the set of possible objects that correspond to a given image is quite restricted.

Shape is spatial information rather than stimulation

The distinction between shape as information versus stimulation—as relational structure versus energy—can be demonstrated by experiments that evaluate shape discrimination by vision, haptics (active touch), and cross-modal comparisons. Gibson (1963) referred informally to such demonstrations without the specific data, but Farley Norman and Flip Phillips (Norman et al., 2004; Norman et al., 2012; Phillips et al., 2009) have carefully evaluated haptic shape discriminations, using both natural objects (bell peppers) and replicas of Gibson's “feelies” (10 small solid sculptures). The general results are clear: solid shapes can be discriminated with similar accuracy visually, haptically, and by cross-modal comparison. Thus, shape is a basic form of information—invariant with changes from optical to mechanical energy.

Shape, texture, reflectance, color, and material substance are covariant and interactive aspects of image information about environmental surfaces

Optical images are reflections from surfaces. The energy and spectral composition of images are jointly determined by extrinsic illumination and intrinsic reflectance of the surfaces. Long-standing questions concern the optical information that yields perceptual constancies of object lightness and color despite changes in ambient illumination. Related questions concern the perceived material substance of objects. The trichromatic theory of color vision is well established: To an excellent approximation, the perceived color of light of any wavelength composition can be visually matched by a linear mixture of red, green, and blue primary components. How, then, can this theory be insufficient to describe the perceived colors and substance of objects such as metals (gold, silver), glass, wood, skin, fur, rocks, fabrics, plastics, vegetation, paper, etc.—as seen in Figure 2? What is the optical image information for perceiving material substance? These questions arise within the science of ecological optics that Gibson presciently outlined in his 1966 and 1979 books. In the present special issue on Gibson's ecological approach, Todd (2020) gives an excellent history, introduction, and survey of contemporary psychophysics of ecological optics. The reader is directed to that paper as background framework and illustration to supplement the present brief discussion. The reflectance of light at each point on the surface of a solid object is specified by its bidirectional reflectance distribution function (BRDF) — the percentages of radiant energy of given wavelengths from given directions reflected in given directions (Nicodemus et al., 1997). In one sense, the physics and geometry are simple, but the possible combinations of specific parameters are unlimited. The range of potential illumination directions at a given point on an opaque surface is a hemisphere above the tangent plane at each surface point; and the percentages of light of given wavelengths reflected in varied directions from that point are distributed over a similar hemisphere of directions. Thus, the BRDF of a given object depends on its material composition, which affects its wavelength-dependent reflectance, absorption, and spatial scattering of light in varied directions above the surface and also below in the case of translucent materials. The BRDF is the optical information about the color and material substance at a given surface point on a given object. It is independent of the object's shape and illumination. Optical images, however, are images of potentially moving objects in the eyes of often-moving observers. The image of a stationary object in the eye of a stationary observer or camera is a temporally frozen spatial pattern of luminous energy and wavelength composition — for simplicity, patterns of ‘shading’ and ‘color’. These shaded colored images are the combined effects of three sets of factors—photometric variations in the intensities, wavelengths, and directional distribution of illumination at each point on the surface, geometric variations in surface shape and orientation relative to the directions of both illumination and image position, and the BRDF pattern determined by the object's material substance. What, then, is the image information that enables perception of the object's shape, color, and material substance? How do we distinguish an object's shape from its color and material substance? What optical image properties are robust under variations in illumination and viewing direction? At least partial answers are offered by a basic insight recently clarified and evaluated in elegant psychophysical experiments by Marlow and Anderson (2021): The photometric, geometric, and material factors have correlated and interactive effects on image structure. Surface shape and reflectance characteristics covary and are jointly specified by certain “photogeometric” constraints on images. Marlow and Anderson investigated two such constraints: (1) The surface shape and reflectance of an object are constrained and specified near self-occluding boundary contours. Because image shading due to surface reflectance vanishes at the boundary contours, external effects of illumination, shadows, and occluding objects are segregated from effects of surface reflectance at the boundary contours (Anderson & Winauer, 2005; Marlow, Mooney, & Anderson, 2019). And the 2-D curvature at each point on the boundary contours identifies the 3-D curvature and shape: positively curved contours identify convexities, negatively curved contours identify saddle-shaped regions, and straight contours identify edges of cylindrical or planar regions (Koenderink, 1984a, b). 2. Correlated variations in surface shape and image shading gradients also distinguish convexities versus concavities and translucent versus opaque materials. Specular highlights occur at points of highly curved convexities of reflective surfaces and run along lines of minimal curvature, outlining bumps and dimples (Fleming, Torralba, & Adelson, 2004; Todd, Norman, & Mingolla, 2004). Marlow and Anderson (2021) showed in addition that image shading gradients due to the surface orientation and convexity versus concavity are systematically different for opaque and translucent materials—translucent materials producing shallow shading gradients and more homogeneous shading at concavities. A third form of optical information about material substance and color involves the scattering of light. Large effects of surface microstructure on macroscopic image shading were described by Tadin, Haglund, Lappin, and Peters (2001), who quantified the light scattering distributions of roughened silica glass and correlated these with the root mean square (RMS) roughness of the microscopic surface topography on scales comparable to the wavelength of light—from about λ/10 to 10λ. Unroughened or slightly roughened glass was transparent, but greater roughness produced highly frosted opaque surfaces without visible specularity. The shift from specular to broad scattering was abrupt, occurring at RMS roughness approximately equal to the wavelength of light. Thus, relative roughness was greater for green than for red light and scattering of green light was both greater and qualitatively different from that of red light. This microscopic texture is invisible, but it has fully visible effects on image shading gradients and probably perceived color as well. Evidently, optical information about the rich variations of environmental surfaces—with characteristic shapes, colors, lightness, opacity, textures, and material substance—involves correlated and interactive effects on images. And this information about environmental objects seems to be robust under variations in illumination and viewing direction. The information is optical, not inferential.

Brief Comments About Information for Perceiving and Navigating 3-Dimensional Space

Visual space is often conceived as an abstract, extrinsic reference frame that is independent of the objects it contains. The intuitive 3-D space of our experience and actions is often regarded as an abstract empty framework inferred from 2-D retinal images, depth cues, and statistics of past experience. Gibson rejected such concepts of abstract visual space as a fundamental error (e.g., 1979, pp. 3, 149). Gibson suggested that spatial layout of the ground plane and other environmental surfaces must constitute intrinsic spatial information about depth. As emphasized in a previous section, however, the depth scales of solid objects and of separations between objects are not determined by their retinal images, even in moving and stereoscopic images. The empirical unreliability of depth discriminations contrasts with the subjective certainty and clarity of our visual experiences and intuitions about environmental spaces. A salient characteristic of the long history of research on space perception is its inconsistency, both empirically and theoretically. Perceived spatial relations vary with task demands, attention, and scene context, and are often inconsistent among different vantage points and even among different aspects of the same object (Lappin, 2016; Lappin, Shelton, & Reiser, 2006). Visual awareness is ordinarily very tolerant of such inconsistencies, but they can be quite visible if one attends to them. Koenderink (2001) pointed out that such inconsistencies are commonplace and concluded that they are evidence for a “multiple-visual-worlds hypothesis”.

Active vision and navigation

As reviewed and discussed by Rogers (2021) and Warren (e.g., 2006, 2021), Gibson emphasized from 1950 onward the importance of interactive coordination of vision and locomotion. Our skills in moving through often-crowded and changing environments demonstrate the power of visual information for guiding our movements. And the optic flow that results from our actions gives continuing visual feedback about egocentric directions and relative distances. Spatial information from optic flow is not metric, but evidently it need not be for most visually guided locomotion. Gibson's early analyses laid the foundation for a new psychophysics of space perception. One of his early insights was that optical information for guiding locomotion is egocentric, defined in relation to the individual observer. The observer-specific information differs from that of other observers, and also from what might be seen in a God's-eye view by an external scientist. For a pilot landing a plane on a runway, for guiding one's steps along a winding rough forest path, or for a baseball player running to catch a fly ball, the optic flow is not only different from that of an external observer, it is also simpler—specified by optic flow in the 2-D space of the actor's eye rather than in the 3-D space for an external observer. Warren (2021) also points out that the optic flow for navigating around obstacles and through openings is structured and scaled relative to the observer's actions as well as her eyes. Movements in three dimensions are combinations of translations (forward/backward, left/right, up/down) and rotations (roll, pitch, yaw) along three perpendicular axes. These movements by either the observer or objects transform the optical array in a “lawful [manner] which leaves certain properties . . . invariant” (Gibson, 1950, p. 153). One of Gibson's insights was that the invariant visual information may be simplified as a dyadic spatial relation between a specific spatial target and the viewpoint of the moving actor whose motions are intended to intercept or collide with the target. Visual information for controlling locomotion in such cases is based on optic flow at the actor's viewpoint, with continuing feedback between the movements and the flow field. Consider the visual control problem for landing an aircraft. A critical part of this task is to maintain the correct glide path so that the aircraft will touch down at the desired target point on the runway. If the glide path is too shallow, the aircraft will land too far down the runway, and if the glide path is too steep, the aircraft will touch down hard in front of the desired aim point with risks in hitting objects or terrain. Gibson's analysis revealed the relation between glide path and the structure of the expanding optic flow around the pilot's aim point. If the glide path is too shallow, the expansion point moves upward toward the horizon, but if the glide path is too steep, the expansion point moves toward the pilot. Gibson pointed out that the same pattern of expansion is available to both the novice and the experienced pilot. The experienced pilot does not see more than the novice, but has learned to discriminate relevant differences in the optic flow. This particular visual control problem is but one example in a large class of problems pertinent to biology, engineering, and sports as well as the visual sciences. A large and growing research literature has identified numerous lawful relationships that constitute information available to active perceivers as they encounter static and moving objects. Examples include controlling time-to-collision (e.g., Boostsma & Oudejans, 1993; Gray & Regan, 1998; Lee, 1976), intercepting a moving target (e.g., Zhao & Warren, 2017), and controlling the altitude of moving aircraft (e.g., Kleiss, & Hubbard, 1993; Warren, 1988).

Limited Rates and Capacities of Human Perception

The preceding discussions dealt with the nature and identification of visual information, which was the focus of Gibson's ecological research. Another basic aspect of information, however, involves quantification. If individuals’ perceptions of their environments are quantitatively limited, then such limitations are clearly fundamental for visual ecology. How might perceptual information be quantified? Shannon's (1948; Shannon & Weaver, 1949) mathematical theory of communication contributed an elegant method for quantifying both (a) the amount of information given by choices from sets of possible messages and (b) the rate at which such information is transmitted over a given communication channel. From Gibson's perspective, Shannon's symbolic definition of information was a critically misleading way to describe the spatiotemporal information for perceiving and acting. From the perspective of the information-processing paradigm, however, not recognizing perceptual and cognitive limitations was a basic shortcoming of the ecological approach. Those mutual critiques both had some validity. A principal challenge in measuring perceptual capacity has involved quantifying environmental information. However, environmental information is not objectively defined, and that was one reason for Gibson's dissatisfaction with the concept of information (e.g., 1979, pp. 62–63). After the following explanation, we will suggest an alternative approach.

Visible information is not objectively quantifiable

Information is based on corresponding relational structures. The psychophysical structures for Gibson's ecological approach are primarily spatiotemporal. For multiple reasons, such potential information is not objectively defined. In the first place, visible information is egocentric—defined relative to an individual observer. Optic flow in the eye of an individual actor is both different and simpler than that seen by an outside observer. Second, active observers select (“pick up”) information. Selected information depends on the observer's goals and actions at a particular time and place, on knowledge of the context and meaning, and on the relevance of spatiotemporal forms and symbols in that setting to the observer's current interests, attitudes, and state of mind and body. Third, the set of all possible spatiotemporal and symbolic relationships that might constitute visible information sometime someplace for someone for some purpose is practically unlimited. Potential information is certainly not arbitrary—it involves invariants over time, place, context, and so forth—but it depends on its relevance to particular discriminations and actions. Fourth, information often also involves prior uncertainties. However, potential observations in unrestricted natural settings are often unknowable beforehand. An alternative: Even if the input information is not objectively quantifiable, the rate of perception might be quantified by the rate of an observer's output behavior in coordinating actions with environmental variations.

Shannon's fundamental theorem about channel capacity

The fundamental theorem of Shannon's (1948; Shannon & Weaver, 1949) theory of communication states that for any given channel of communication there exists an upper limit on the rate at which it can transmit information. This is a physical limit, imposed by that channel's maximum rate of entropy. Shannon defines the rate of information transmission, in bits/s, as where U(X) and U(Y) are the input and output uncertainties, respectively, and is the average uncertainty of the input information, U(Y|X) is the conditional uncertainty of the output given the input, and t is a time interval in seconds. Accordingly, the channel capacity, C, is the maximum value of that rate, After the publication of Shannon's theory, experimental psychology used that theory to quantify human perception and performance. Opinions about its success vary, but that research discovered what seemed at the time to be principled methods for quantifying human perception and its limits (e.g., Eriksen and Hake, 1955; Fitts, 1954; Garner, 1962; Hyman, 1953; Miller, 1956). Experimental tasks typically evaluated accuracies of “absolute judgments” in identifying stimuli randomly selected from specified sets of alternatives. Capacities measured that way averaged over many studies at around 2.5 bits—the basis for Miller’s (1956) words about “the magical number seven, plus or minus two” (−log2 (1/7) = 2.81 bits). Hyman (1953) found a linear dependence of response times on uncertainty with a slope of about 5.75 bits/s. That line of research, however, now has little influence, and Gibson regarded it as irrelevant to his ecological approach. Shannon's quantification is quite general, applicable to physical signals of any form. However, it is a weak measurement that treats signals and messages as only nominal categories, involving merely same/different relations. Spatial and temporal order, similarity, difference, distance, connectedness, gradients, or so forth are irrelevant. The amount of information in such categorical structures is based entirely on statistical uncertainty, on the probability distribution over the set of possible signals and messages. In Shannon's communication model, the set of possible signals is known beforehand to both sender and receiver, but that prior knowledge is not realistic for perceiving and acting in most natural environments. Moreover, categorical structures are not much help in describing environmental surfaces and images, optic flow for actions in 3-D environments, or most other information relevant to ecological psychophysics.

Measuring rates of discrimination and action

Despite the insufficiency of Shannon's methodology for identifying or quantifying environmental information, we recently found it to be useful for describing the output information rates for behavioral actions. Our experiments involved simple measures of response times (RTs) for recognizing specified targets. The rationale and computational methods are detailed in other papers (Lappin, Lowe, Reppert, Schall, & Bell, in preparation; Lappin, Morse, & Seiffert, 2016; Lappin, Seiffert, & Bell, 2020), so they are only sketched here. Simple computations yield ratio-scaled measures (in bits/s) of visual recognition rates over time following a given visual event. Using factorial experiments, we quantified the effects of visual information format on the time course of recognition. The methods offer a strategy for measuring perceptual rates. And our findings point to quantifiable limits on the rates of perception. A few basic formulas are useful before illustrating the methods. RT survivor function: cumulative hazard function: information gained in a particular time interval estimated hazard rate, recognition rate in a given interval: These computations are simple numerical transforms of the rank-ordered RTs, but they measure the temporal processes of visual recognition. The cumulative hazard rate is a ratio scale of the information acquired by a given time, and the hazard rate measures the recognition rate at a given time. The hazard functions are non-parametric descriptive measures with no assumptions about the distribution parameters or stochastic processes. In several studies, the hazard rates revealed lawful characteristics of recognition processes that were not evident in the RTs as such. Lappin et al. (in preparation) summarize the converging results of several studies that validate the interpretation of these measures. Evidence that visual recognition occurs at a limited rate was obtained in two studies by Lappin et al. (2016) and Lappin et al. (2020). In Lappin et al. (2016), the observers’ task was to rapidly detect a nonrandom target motion by one of a set of otherwise identical objects moving in randomly changing directions. These target motions occurred unpredictably during continuous observation periods of several minutes. To detect the target, one had to discriminate the nonrandom target motion from the random-motion background. Target motions continued until they were detected, becoming more visible with increasing duration and distance. Variations in target uncertainty and visual field complexity were based on the set size, n = {2, 4, 6, 8}, of moving objects. Figure 5 is a graph of the resulting hazard rates. To illustrate how the temporal structure of the visual process was invariant with set size, each of the rates has been multiplied by set size, thereby estimating performance for set size n = 1.

Figure 5.

Adjusted hazard rates, n × g = n [H + Δt) − H, as a function of response time (RT). Multiplying each hazard rate by the set size n estimates the rate for set size n = 1, illustrating a common temporal process independent of set size. Successive points on each function were estimated in running 20% windows at successive 5% intervals of the survivor function, at S(t) = {0.90 − 0.70, 0.85 − 0.65, 0.80 − 0.60, . . . 0.30 − 0.10}. A total of 16 independent data points (four independent data points for each of the four set sizes, at successive 20% intervals) were described by three parameters, R = 12) = 0.98. The function given by those parameters is shown by the solid lines. (This figure is adapted from Lappin et al. (2016), Fig. 3, p. 2475.) As may be seen, the four hazard rate functions, g, at any given time were (a) inversely proportional to the set size, and (b) increased proportionally with RT over roughly 400–900 ms, independent of set size. Thus, hazard rates at any given RT were described as a numerical product of two functionally distinct factors—a visual process that gained information proportionally with time, invariant with set size, and a visual uncertainty factor proportional to set size, invariant with RT: where V(t) represents visual integration of motion information, proportional to time over an interval of roughly 0.5 s [V(t) = (t − c)/(k − c) for c ≤ t ≤ k, V(t) = 0 for t < c, and V(t) = 1 if t > k, with c ≈ 0.4 s and k ≈ 0.9 s]; U(n) = n represents the target uncertainty and visual field complexity; and the constant C ≈ 28 bits/s represents a limiting channel capacity—the estimated maximum recognition rate for set size n = 1. Thus, the rate of visual recognition at any given time was determined by three functionally separate factors: (a) temporal integration of visual motion information, (b) a divisive effect of visual field complexity, and (c) a limiting recognition rate resembling Shannon's maximum channel capacity. With those three parameters, the above equation accounted for 98% of the variance in 16 independent hazard rates (4 for each set size). Converging evidence of a limiting channel capacity was obtained in a second study by Lappin et al. (2020), who also used hazard rates to evaluate effects of visual field complexity. Temporal characteristics of the visual information, the detection task, and RT distributions were all different from those of the previous study. The moving patterns were more predictable, involving constant-speed linear trajectories representing “flight paths” of a number (n = {1, 2, 4, 6}) of “planes” of various colors that identified “friend”, “foe”, or “unknown” status. The observer's task was to detect a sudden change in either the color or motion direction of a randomly selected plane. In a simple detection task, the observer was required to respond quickly to a change in either color or motion direction; and a selective detection task required a response only when the change indicated an increased “threat” defined by half of the color changes and half of the direction changes. The principal result pertinent to the present discussion was that increased set size again had a temporally constant divisive effect on the hazard rates. That effect was also constant over performance differences due to changes of color versus motion direction and over differences between simple versus selective detections. These time-invariant effects of visual field complexity were striking because the recognition rates for visual changes varied markedly over time, resembling inverted V-shaped functions of a brief window of awareness that opened and closed over 200–300 ms after a display change, after which the sudden visual changes were no longer noticed. Changes in motion direction were less detectable than color changes, selective detections were less frequent than simple detections, and both of these effects varied with time. Nevertheless, the temporally constant divisive effects of visual set size were independent of both of those variables. Recognition rates (RT hazard rates) at any given time were again selectively influenced as products of functionally separate factors associated with (a) visual field set size, (b) type of visual change, (c) detection task, and (d) a maximum rate parameter, C ≈ 21 bits/s. Together, these recent findings suggest that Shannon's fundamental theorem — that any physical channel has a maximum rate at which it can transmit information — probably applies to the rate of conscious visual recognition. The visual nervous system is, after all, a dynamic, energy-consuming physical system. Perhaps we should expect that visual perception will also occur at a limited rate. Evidently, spreading visual attention over multiple objects and events reduces the rate of perceiving any individual thing. James Gibson saw Shannon's mathematical theory as irrelevant for characterizing the spatiotemporal forms of information relevant to the ecological approach. His doubts were well-founded. Accordingly, questions about possible limitations on the capacity and rate of perception have been largely overlooked in the ecological approach. Nevertheless, perceptual capacity is relevant to the dynamic relationships between environments, observers, and actors. Perceptual capacity has long been important in experimental psychology (e.g., James, 1890; Woodworth & Schlossberg, 1954; Wundt, 1894) and in the information-processing approach (e.g., Miller, 1956; Norman, 1969; Schneider & Shiffrin, 1977; Sperling, 1960). Visual information in those research paradigms, however, has typically involved varied numbers of briefly viewed, discrete, static ‘stimuli’ rather than the spatiotemporal patterns relevant to the ecological approach. Thus, a major open area for research concerns the rate at which vision acquires spatiotemporal information in real time. The ecological approach has often investigated task environments that demand real-time spatial and temporal coordination of an action and an environmental event (e.g., Warren, 2006; Zhao & Warren, 2017), though the actions have usually focused on single well-defined events. Insufficient evidence is available about how the complexity of environmental information affects the rate of perception. We still know too little about the perceptual complexity of information in natural spatiotemporal patterns involving optic flow, 3-D spatial relations, surface shape, and photo-geometric effects. Better understanding of perceptual capacity limits is important not only for the ecological approach, but for psychological and brain sciences more generally. Questions about perceptual capacity also have widespread applications in human factors engineering, education, sports, and medicine.

49 in total

Review 10. What is binocular disparity?

Authors: Joseph S Lappin
Journal: Front Psychol Date: 2014-08-12

Form and Function in Information for Visual Perception.

Information Involves Dyadic Relations. It is not an Objective Thing

A Psychophysical Continuum From Spatiotemporal to Symbolic Information

The representational theory of measurement

Symbolic versus spatiotemporal information

Symbols and spatiotemporal forms require different perceptual systems

Visual Information is Spatiotemporal

Surface Structure is a Basic Form of Visual Information

Information about surface shape is intrinsically 3-D

Shape is spatial information rather than stimulation

Shape, texture, reflectance, color, and material substance are covariant and interactive aspects of image information about environmental surfaces

Brief Comments About Information for Perceiving and Navigating 3-Dimensional Space

Active vision and navigation

Limited Rates and Capacities of Human Perception

Visible information is not objectively quantifiable

Shannon's fundamental theorem about channel capacity

Measuring rates of discrimination and action

1. Foundations of spatial vision: from retinal images to perceived shapes.

2. Perceptual representation of visible surfaces.

3. On the relation between time and space in the visual discrimination of velocity.

4. Stimulus information as a determinant of reaction time.

5. Image segmentation and lightness perception.

6. Environmental context influences visually perceived distance.

7. Photogeometric Cues to Perceived Surface Shading.

8. The structure of images.

9. Temporal Limits of Visual Motion Processing: Psychophysics and Neurophysiology.

Review 10. What is binocular disparity?