Visual routine

A visual routine is a means of extracting information from a visual scene.

Shimon Ullman, in his studies on human visual cognition, proposed that the human visual system's task of perceiving shape properties and spatial relations is split into two successive stages: an early "bottom-up" state during which base representations are generated from the visual input, and a later "top-down" stage during which high-level primitives dubbed "visual routines" extract the desired information from the base representations. In humans, the base representations generated during the bottom-up stage correspond to retinotopic maps (more than 15 of which exist in the cortex) for properties like color, edge orientation, speed of motion, and direction of motion. These base representations rely on fixed operations performed uniformly over the entire field of visual input, and do not make use of object-specific knowledge, task-specific knowledge, or other higher-level information.

The visual routines proposed by Ullman are high-level primitives which parse the structure of a scene, extracting spatial information from the base representations. These visual routines are composed of a sequence of elementary visual operators specific to the task at hand. Visual routines differ from the fixed operations of the base representations in that they are not applied uniformly over the entire visual field --- rather, they are only applied to objects or areas specified by the routines.

Ullman lists the following as examples of visual operators: shifting the processing focus, indexing a salient item for further processing, spreading activation over an area delimited by boundaries, tracing boundaries, and marking a location or object for future reference. When combined into visual routines, these elementary operators can be used to perform relatively sophisticated spatial tasks such as counting the number of objects satisfying a certain property, or recognizing a complex shape.

A number of researchers have implemented visual routines for processing camera images, to perform tasks like determining the object a human in the camera image is pointing at. Researchers have also applied the visual routines approach to artificial map representations, for playing real-time 2D video games. In those cases, however, the map of the video game was provided directly, alleviating the need to deal with real-world perceptual tasks like object recognition and occlusion compensation.