| Literature DB >> 31983809 |
Ruzena Bajcsy1, Yiannis Aloimonos2, John K Tsotsos3.
Abstract
Despite the recent successes in robotics, artificial intelligence and computer vision, a complete artificial agent necessarily must include active perception. A multitude of ideas and methods for how to accomplish this have already appeared in the past, their broader utility perhaps impeded by insufficient computational power or costly hardware. The history of these ideas, perhaps selective due to our perspectives, is presented with the goal of organizing the past literature and highlighting the seminal contributions. We argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in the robotic perception systems of the next decade.Entities:
Keywords: Attention; Control; Perception; Sensing
Year: 2017 PMID: 31983809 PMCID: PMC6954017 DOI: 10.1007/s10514-017-9615-3
Source DB: PubMed Journal: Auton Robots ISSN: 0929-5593 Impact factor: 3.000
The five main constituents of an actively perceiving agent are defined
| Active perception | Definition |
|---|---|
| Why | The current state of the agent determines what its next actions might be based on the expectations that its state generates. These are termed Expectation-Action tuples. This would rely on any form of inductive inference (inductive generalization, Bayesian inference, analogical reasoning, prediction, etc.) because inductive reasoning takes specific information (premises) and makes a broader generalization (conclusion) that is considered probable. The only way to know is to test the conclusion. A fixed, pre-specified, control loop is not within this definition |
| What | Each expectation applies to a specific subset of the world that can be sensed (visual field, tactile field, etc.) and any subsequent action would be executed within that field. We may call this Scene Selection |
| How | A variety of actions must precede the execution of a sensing or perceiving action. The agent must be placed appropriately within the sensory field (Mechanical Alignment). The sensing geometry must be set to enable the best sensing action for the agent’s expectations (Sensor Alignment, including components internal to a sensor such as focus, light levels, etc.). Finally, the agent’s perception mechanism must be adapted to be most receptive for interpretation of sensing results, both specific to current agent expectations as well as more general world knowledge (Priming) |
| When | An agent expectation requires Temporal Selection, that is, each expectation has a temporal component that prescribes when is it valid and with what duration |
| Where | The sensory elements of each expectation can only be sensed from a particular viewpoint and its determination is modality specific. For example, how an agent determines a viewpoint for a visual scene differs from how it does so for a tactile surface. The specifics of the sensor and the geometry of its interaction with its domain combine to accomplish this. This will be termed the Viewpoint Selection process |
Fig. 1The basic elements of Active Perception broken down into their constituent components. Instances of an embodiment of active perception would include the Why component and at least one of the remaining elements whereas a complete active agent would include at least one component from each
Details of the components of the diagram of Fig. 1. For those elements where multiple Research Seeds are given, it is because each addresses a different dimension of the problem; in most cases, many open problems remain for each component
| Component | Definition | Research Seeds |
|---|---|---|
| HOW: | Mechanical alignment | |
|
| Active control of motors (e.g., sequence of motor actions for multi-view stereo) |
Moravec ( |
|
| Active control of robot body and body part position and pose (e.g., to move robot to a location more advantageous for current task) |
Nilsson ( |
| HOW: | Priming | |
|
| Active adaptation of perceptual interpretation system for current task and physical environment (e.g., tune system to be more receptive to recognition of objects and events relevant to current task) |
Williams et al. ( |
|
| Active adaptation of sensing system (e.g., to tune sensors to be more sensitive to stimuli relevant to current task) |
Bajcsy and Rosenthal ( |
| HOW: | Sensor alignment | |
|
| Active control of the optical elements of a visual sensor (focal length, gain, shutter speed, white balance, etc.) (e.g., accommodation: increases optical power to maintain a clear image on an object as it draws near) |
Tenenbaum ( |
|
| Active control of non-contact, non-visual sensors, such as inertial measurement units (e.g., the choice of path along which the IMU moves to measure linear acceleration and rotational velocity) | Early twentieth century, such as rocket stabilization |
|
| Active control of sensors that measure the interaction with objects and environment such as applied forces/torques, friction, and shape (e.g., the choice of contact pattern over time) |
Allen and Bajcsy ( |
| WHEN: | Temporal Selection | |
|
| Active prediction of when an event is expected (e.g., predicting the object movement in a sequence) |
Tsotsos et al. ( |
|
| Active prediction of how long an event is expected (e.g., predicting the temporal extent of movement in an image sequence) |
Tsotsos ( |
| WHAT: | Scene selection | |
|
| Active prediction of where in a scene a stimulus relevant to current task may appear (e.g., selection of the subset of an image where a face outline can be found) |
Kelly ( |
|
| Active prediction of which portion of a real-world scene to view (e.g., indirect object search, where an easy search for a semantically related object might facilitate search for a target object) |
Garvey ( |
| WHERE: | Viewpoint selection | |
|
| Active selection of agent pose most appropriate for selecting a viewpoint most useful for current task (e.g., moving an agent to a close enough position for viewing a task-related object or event) |
Nilsson ( |
|
| Active selection of the pose of a sensor most appropriate for the current task (includes convergent binocular camera systems) (e.g., pointing a camera at a target in with the best viewing angle for its recognition) |
Brown ( |
Fig. 2The current standard processing pipeline common in computer vision
Fig. 3Active perception processing pipeline