Biological motion perception

Biological motion perception is the act of perceiving the fluid unique motion of a biological agent. The phenomenon was first documented by Swedish perceptual psychologist, Gunnar Johansson, in 1973. There are many brain areas involved in this process, some similar to those used to perceive faces. While humans complete this process with ease, from a computational neuroscience perspective there is still much to be learned as to how this complex perceptual problem is solved. One tool which many research studies in this area use is a display stimuli called a point light walker. Point light walkers are coordinated moving dots that simulate biological motion in which each dot represents specific joints of a human performing an action.

Currently a large topic of research, many different models of biological motion/perception have been proposed. The following models have shown that both form and motion are important components of biological motion perception. However, to what extent each of the components play is contrasted upon the models.

Neuroanatomy
Research in this area seeks to identify the specific brain regions or circuits responsible for processing the information which the visual system perceives in the world. And in this case, specifically recognizing motion created by biological agents.

Single Cell Recording
The most precise research is done using single-cell recordings in the primate brain. This research has yielded areas important to motion perception in primates such as area MT (middle temporal visual area), also referred to as V5, and area MST (medial superior temporal area). These areas contain cells characterized as direction cells, expansion/contraction cells, and rotation cells, which react to certain classes of movement.

Neuroimaging
Additionally, research on human participants is being conducted as well. While single-cell recording is not conducted on humans, this research uses neuroimaging methods such as fMRI, PET, EEG/ERP to collect information on what brain areas become active when executing biological motion perception tasks, such as viewing point light walker stimuli. Areas uncovered from this type of research are the dorsal visual pathway, extrastriate body area, fusiform gyrus, superior temporal sulcus, and premotor cortex. The dorsal visual pathway (sometimes referred to as the "where" pathway), as contrasted with the ventral visual pathway ("what" pathway), has been shown to play a significant role in the perception of motion cues. While the ventral pathway is more responsible for form cues.

Neuropsychological Damage
Valuable information can also be learned from cases where a patient has suffered from some sort of neurological damage and consequently loses certain functionalities of neural processing. One patient with bilateral lesions that included the human homologue of area MT, lost their ability to see biological motion when the stimulus was embedded in noise, a task which the average observer is able to complete. Another study on stroke patients sustaining lesions to their superior temporal and premotor frontal areas showed deficits in their processing of biological motion stimuli, thereby implicating these areas as important to that perception process. A case study conducted on a patient with bilateral lesions involving the posterior visual pathways and effecting the lateral parietal-temporal-occipital cortex struggled with early motion tasks, and yet was able to perceive the biological motion of a point light walker, a higher-order task. This may be due to the fact that area V3B and area KO were still intact, suggesting their possible roles in biological motion perception.

Background
The relative roles of form cues compared to motion cues in the process of perceiving biological motion is unclear. Previous research has not untangled the circumstances under which local motion cues are needed or only additive. This model looks at how form-only cues can replicate psychophysical results of biological motion perception.

Model
Template Creation

Same as below. See 2.2.2 Template Generation

Stage 1

The first stage compares stimulus images to the assumed library of upright human walker templates in memory. Each dot in a given stimulus frame is compared to the nearest limb location on a template and these combined, weighted distances are outputted by the function:


 * $$F_{tc}(t) = \sum_{i=1}^n e^ \left (\frac{(\mu_{tc}-p_i(t))^2}{2X\sigma}\right)$$

where $$ p_i$$ gives the position of a particular stimulus dot and $$ \mu_{tc}$$ represents the nearest limb position in the template. $$\sigma$$ represents the size of the receptor field to adjust for the size of the stimulus figure.

The best fitting template was then selected by a winner-takes-all mechanism and entered into a leaky integrator:


 * $$\tau \frac{\delta u_{1,2}(t)}{\delta t} = -u_{1,2}+i_{1,2}+w_+f(u_{1,2}(t))- w_-f(u_{2,1}(t)) $$

where $$w_+$$ and $$w_-$$ are the weights for lateral excitation and inhibition, respectively, and the activities $$u_{1,2}$$ provide the left/right decision for which direction the stimulus is facing.

Stage 2

The second stage attempts to use the temporal order of the stimulus frames to change the expectations of what frame would be coming next. The equation


 * $$\tau \frac{\delta v_{1,2}(t)}{\delta t} = -v_{1,2}(t) + w_{m,n}u(t) $$

takes into account bottom-up input from stage 1 $$(u)$$, the activities in decision stage 2 for the possible responses $$(v_{1,2})$$, and weights the difference between selected frame $$n$$ and previous frame $$m$$.

Implications

This model highlights the abilities of form-related cues to detect biological motion and orientation in a neurologically feasible model. The results of the Stage 1 model showed that all behavioral data could be replicated by using form information alone - global motion information was not necessary to detect figures and their orientation. This model shows the possibility of the use of form cues, but can be criticized for a lack of ecological validity. Humans do not detect biological figures in static environments and motion is an inherent aspect in upright figure recognition.

Overview
Old models of biological motion perception are concerned with tracking joint and limb motion relative to one another over time. However, recent experiments in biological motion perception have suggested that motion information is unimportant for action recognition. This model shows how biological motion may be perceived from sequences of posture recognition, rather than from the direct perception of motion information. An experiment was conducted to test the validity of this model, in which subjects are presented moving point-light and stick-figure walking stimuli. Each frame of the walking stimulus is matched to a posture template, the progression of which is recorded on a 2D posture–time plot that implies motion recognition.

Posture Model
Template Generation

Posture templates for stimulus matching were constructed with motion tracking data from nine people walking. 3D coordinates of the twelve major joints (feet, knees, hips, hands, elbows, and shoulders) were tracked and interpolated between to generate limb motion. Five sets of 2D projections were created: leftward, frontward, rightward, and the two 45° intermediate orientations. Finally, projections of the nine walkers were normalized for walking speed (1.39 seconds at 100 frames per cycle), height, and hip location in posture space. One of the nine walkers was chosen as the walking stimulus, and the remaining eight were kept as templates for matching.

Template Matching

Template matching is computed by simulating posture selective neurons as described by A neuron is excited by similarity to a static frame of the walker stimulus. For this experiment, 4,000 neurons were generated (8 walkers times 100 frames per cycle times 5 2D projections). A neuron's similarity to a frame of the stimulus is calculated as follows:


 * $$R_\psi(t) = \sum_{i = 1}^N \exp \left( - \frac{\left | (x_i(t), y_i(t)) - (\Chi_i, _\psi, \Rho_i, _\psi) \right \vert ^2}{2 \sdot \sigma} \right)$$

where $$(x_i, y_i)$$ describe a stimulus point and $$(c_i, r_i)$$ describe the limb location at time $$t$$; $$\psi$$ describes the preferred posture; $$R$$ describes a neuron's response to a stimulus of $$N$$ points; and $$\sigma$$ describes limb width.

Response Simulation

The neuron most closely resembling the posture of the walking stimulus changes over time. The neural activation pattern can be graphed in a 2D plot, called a posture-time plot. Along the x axis, templates are sorted chronologically according to a forward walking pattern. Time progresses along the y axis with the beginning corresponding to the origin. The perception of forward walking motion is represented as a line with a positive slope from the origin, while backward walking is conversely represented as a line with a negative slope.

Motion Model
Motion Detection in Posture Space

The posture-time plots used in this model follow the established space-time plots used for describing object motion. Space-time plots with time at the y axis and the spatial dimension at the x axis, define velocity of an object by the slope of the line. Information about an object's motion can be detected by spatio-temporal filters. In this biological motion model, motion is detected similarly but replaces the spatial dimension for posture space along the x axis, and body motion is detected by using posturo-temporal filters rather than spatio-temporal filters.

Posturo-Temporal Filters

Neural responses are first normalized as described by


 * $$\nu_\psi(t) = \frac{R_\psi(t) - \bar{R}}{\bar{R}}$$

where $$R_y(t)$$ describes the neural response; $$_\psi$$ describes the preferred posture at time $$t$$; $$\bar{R}$$ describes the mean neural response over all neurons over $$t$$; and $$n_y(t)$$ describes the normalized response. The filters are defined for forward and backward walking ($$g^f, g^b$$ respectively). The response of the posturo-temporal filter is described


 * $$r_\psi(\tau) = \sum_{t=0ms}^\tau \sum_{p=1}^{100} g_{\tau, \psi} (t, p) \sdot \nu_\psi(t)$$

where $$r$$ is the response of the filter at time $$\tau$$; and $$p$$ describes the posture dimension. The response of the filter is normalized by


 * $$N_\psi(\tau) = \max \left [ \left ( \frac{r_\psi(\tau)}{\sum_t \sum_p g_{\tau, \psi} (t, p)^2} \right ), 0 \right ]$$

where $$N$$ describes the response of the neuron selecting body motion. Finally, body motion is calculated by


 * $$\varepsilon_\psi(\tau) = N_\psi^F(\tau)^2 - N_\psi^B(\tau)^2$$

where $$\varepsilon$$ describes body motion energy.

Statistical Analysis and Psychophysical Experiments
The following model suggests that biological motion recognition could be accomplished through the extraction of a single critical feature: dominant local optic flow motion. These following assumptions were brought about from results of both statistical analysis and psychophysical experiments.

First, Principal component analysis was done on full body 2d walkers and point light walkers. The analysis found that dominant local optic flow features are very similar in both full body 2d stimuli and point light walkers (Figure 1). Since subjects can recognize biological motion upon viewing a point light walker, then the similarities between these two stimuli may highlight critical features needed for biological motion recognition.

Through psychophysical experiments, it was found that subjects could recognize biological motion using a CFS stimulus which contained opponent motion in the horizontal direction but randomly moving dots in the horizontal direction (Figure 2). Because of the movement of the dots, this stimulus could not be fit to a human skeleton model suggesting that biological motion recognition may not heavily rely on form as a critical feature. Also, the psychophysical experiments showed that subjects similarly recognize biological motion for both the CFS stimulus and SPS, a stimulus in which dots of the point light walker were reassigned to different positions within the human body shape for every nth frame thereby highlights the importance of form vs the motion (Fig.1.). The results of the following psychophysical experiments show that motion is a critical feature that could be used to recognize biological motion.

The following statistical analysis and psychophysical experiments highlight the importance of dominant local motion patterns in biological motion recognition. Furthermore, due to the ability of subjects to recognize biological motion given the CFS stimulus, it is postulated that horizontal opponent motion and coarse positional information is important for recognition of biological motion.

Model
The following model contains detectors modeled from existing neurons that extracts motion features with increasing complexity. (Figure 4).

Detectors of Local Motion

These detectors detect different motion directions and are modeled from neurons in monkey V1/2 and area MT The output of the local motion detectors are the following:
 * $$G_p(x) = H(v(x),v_1,v_2) \cdot b(\theta,\theta_p) $$

where $$x$$ is the position with preferred direction $$\theta_p,$$, $$v$$ is the velocity, $$\theta$$ is the direction, and $$H$$ is the rectangular speed tuning function such that


 * $$H(v,v_1,v_2) = 1 $$ for $$ v_1 < v < v_2 $$ and $$ H(v,v_1,v_2) = 0 $$ otherwise.

The direction-tuning of motion energy detectors are given by


 * $$ b(\theta,\theta_p)=\left\{ \left ( \frac{1}{2} \right ) \left [\ 1 + cos(\theta,\theta_p)\right]\ \right\}^q $$

where $$q$$ is a parameter that determines width of direction tuning function. (q=2 for simulation).

Neural detectors for opponent motion selection

The following neural detectors are used to detect horizontal and vertical opponent motion due by pooling together the output of previous local motion energy detectors into two adjacent subfields. Local motion detectors that have the same direction preference are combined into the same subfield. These detectors were modeled after neurons sensitive to opponent motion such as the ones in MT and medial superior temporal (MST). Also, KO/V3B has been associated with processing edges, moving objects, and opponent motion. Patients with damage to dorsal pathway areas but an intact KO/V3B, as seen in patient AF can still perceive biological motion.

The output for these detectors are the following:


 * $$o_l(x)=\sqrt{max(g_p(x_i))max(g_r(x_j))}$$

where $$ x $$ is the position the output is centered at, direction preferences $$ p $$ and $$ r $$, and $$i,j$$ signify spatial positions of two subfields.

The final output of opponent motion detector is given as


 * $$o_l(x)=max(o_l(x_k))$$

where output is the pooled responses of detectors of type $$ l $$ at $$ x_k $$ different spatial positions.

Detectors of optic flow patterns

Each detector looks at one frame of a training stimulus and compute an instantaneous optic flow field for that particular frame. These detectors model neurons in Superior temporal sulcus and Fusiform face area

The input of these detectors is arranged from vector u and are comprised from the previous opponent motion detectors' responses. The output is the following:


 * $$G(u) = e^{(u-u_0)^T C(u-u_0)}$$

such that $$u_0$$ is the center of the radial basis function for each neuron and $$ C $$ is a diagonal matrix which contains elements that have been set during training and correspond to vector u. These elements equal zero if the variance over training doesn't exceed a certain threshold. Otherwise, these elements equal the inverse of variance.

Since recognition of biological motion is dependent on the sequence of activity, the following model is sequence selective. The activity of the optic flow pattern neuron is modeled by the following equation of


 * $$\tau H_k^l(t) = -H_k^l(t) + \sum_m w(k-m)f(H_k^l(t)+G_k^l(t))$$

in which $$k$$ is a specific frame in the $$l$$-th training sequence, $$\tau$$ is the time constant.$$f(H)$$ a threshold function, $$w(m)$$ is an asymmetric interaction kernel, and $$G_k^l(t)$$ is obtained from the previous section.

Detectors of complete biological motion patterns The following detectors sum the output of the optic flow pattern detectors in order to selectively activate for whole movement patterns (e.g. walking right vs. walking left). These detectors model similar neurons that optic flow pattern detectors model:

Superior temporal sulcus and Fusiform face area

The input of these detectors are the activity of the optic flow motion detectors, $$H_l^l(t)$$. The output of these detectors are the following:


 * $$\tau_s P^l (t) = -P^l(t) + \sum_k H_l^l(t)$$

such that $$P^l(t)$$ is the activity of the complete biological motion pattern detector in response to pattern type $$ l $$ (e.g. walking to the left), $$ \tau_s $$ equals the time constant (used 150 ms in simulation), and $$H_k^l(t)$$ equals the activity of optic flow pattern detector at kth frame in sequence l.

Testing the model
Using correct determination of walking direction of both the CFS and SPS stimulus, the model was able to replicate similar results as the psychophysical experiments. (could determine walking direction of CFS and SPS stimuli and increasing correct with increasing number of dots). It is postulated that recognition of biological motion is made possible by the opponent horizontal motion information that is present in both the CFS and SPS stimuli.