Structure from motion (psychophysics)

In visual perception, structure from motion (SFM) refers to how humans (and other living creatures) recover depth structure from object's motion. The human visual field has an important function: capturing the three-dimensional structures of an object using different kinds of visual cues.

SFM is a kind of motion visual cue that uses motion of two-dimensional surfaces to demonstrate three-dimensional objects, and this visual cue works really well even independent of other depth cues. Psychological, especially psychophysical studies have been focused on this topic for decades.

Psychophysical studies
In a 1953 study on SFM done by Wallach and O'Connell the kinetic depth effect was tested. They found that by turning shadow images of a three dimensional object can be used as a cue to recover the structure of the physical object quite well. Johansson's study conducted in 1973 discovered our ability to perceive human form of walking or dancing simply from projected motion of several points on the body, this motion pattern was later termed as biological motion.

A proposition for how we generate a 3D surface representation of an object is that our visual system uses the spatial and temporal integration of information to detect the structure. Other studies agree that SFM is a process which contains several aspects: the perception of rotating direction, perceived orientation of rotation axis, space interpolation effects and object recognition.

Given its complexity, SFM involves very high-level of visual processing. Studies have shown that MT, rather than V1 (the primary visual cortex), is directly involved in the generation of the SFM perception. Neurons in MT are also triggered by motion parallax and show depth signs independent of other depth cues, and MT's representation of three-dimensions also confirms the close relationship between MT area and SFM. However, V1 neuron activities are indirectly related to SFM perception, which receives general feedback from MT.

The importance of motion perception of SFM in detecting three-dimensional structure is also demonstrated by several studies. 3D objects can be perceived from the 2D projections of the moving object on a screen, but not the stationary 2D images. Also, one essential condition for SFM perception to occur accurately is that the projection of the object must has simultaneously changing contour and lines. A relatively invariant point lifetime threshold of SFM (50-85 msec) was found, and it turns out that this threshold is close to the threshold of velocity measurement, which suggests that velocity measurement is involved in the SFM processing procedure. Given such mechanism, human visual system can derive an accurate model of SFM even with the presence of noise.

Being a complex process, SFM requires more than orthographic projections approximations, though many experiments used orthographic projections. Studies have found that higher order visual cues like acceleration and perspective projection are involved in this process rather than just first order flow (meaning SFM is partly a top down process). Combination of all orders of visual cues gives the best estimate of 3D objects.