User:Bxiong1202

A spatial-temporal pattern, is a type of recurring events or objects in terms of topological, geographic  or  geometric  properties of entities that appear in a sequence of data ordered by time. The term spatial pattern in this context may refer to points, regions, line segments or curves and any other interesting space related patterns that can be observed from the data. Spatial temporal patterns are often used in computer vision for automatic categorization and localization of human actions in video sequences.

Overview
Spatial-temporal patterns are normally used to solve space-time  problems. The space-time problem can be formulized as following: suppose we observed spatial data at each of m time points, i.e., { $$ [ Z (s_{1_{i}},t_i),..., Z (s_{n_{i}},t_i)]: i = 1 ,..., m $$ }. Here $$ s_{1_{i}i},..., s_{n_{i}i} $$ are the $$ n_{i}$$ data location at time $$ i$$, and $$ t_1 < t_{2}<... t_{m} $$. In other words, we are interested in how to predict or recognize the appearance of certain important points in the future by studying the spatial-temporal patterns in the given data.

Spatial-temporal patterns are often used in computer vision as features to represent different categories of actions or events in order to recognize certain actions or events that we are interested from the video sequences. Analysis on spatial-temporal patterns also has diverse applications in other fields: ecologists use it to interpret and predict landscape changes and natural disasters; physicists use it to study the placement of galaxies in the cosmos.

Spatial-temporal interest points
Spatial-temporal patterns are normally represented by a collection of spatial-temporal interest points, which are sets of points that distinguish one class from others in the space and time domain. Different techniques are proposed to represent spatial-temporal interest points such as spatial-temporal corners, periodic spatial-temporal features, volumetric features and spatial-temporal regions of high entropy.

Spatial-temporal corners were first proposed by Ivan Laptev and Tony Lindeberg. They detect interests point that are simultaneous maxima of the spatial-temporal corner function that they proposed as well as extrema of the normalized spatial-temporal Laplace operator. Thus, they can detect interest points for a set of sparsely distributed scale values and then track these points in both space and time domain. ( For more details on the corner function and the Laplace operator, please refer to the paper.)
 * Spatial-temporal corners

Their work provides an automatic scale selection for the spatial-temporal interest points, but the spatial-temporal corners can be quite rare and thus are too sparse for many types of motions.

This representation of spatial-temporal interest points is also proposed by Ivan Laptev. Compared with spatial-temporal corners, periodic spatial-temporal features, on the contrary, have a rich set of features to represent each action class, but do not provide an automatic scale selection.
 * Periodic spatial-temporal features

Volumetric features are studied for action event classifications by Ke et al. Their initial experiments using pixel intensity performed poorly, mainly because the changes in appearance of the actor, the background, and lighting conditions influence the intensity of the pixels. Then they decided to compute the volumetric features on the video’s optical ﬂow. They separate the optical ﬂow into its horizontal and vertical components and compute volumetric features on each component.
 * Volumetric features

Volumetric features provide dense features at many locations and scales, but need to process a video pyramid in order to achieve spatial scale invariance.

Spatial-temporal region of high entropy is a collection of spatiotemporal events that are localized at points that are salient in both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. Oikonomopoulos et al. introduced a distance metric between two collections of spatiotemporal salient points, which is based on the chamfer distance and an iterative linear time-warping technique, in order to find the salient region.
 * Spatial-temporal region of high entropy

This approach to select spatial-temporal interest points provides automatic scale selection, but examples suggest that high entropy regions are rare.

Spatial-temporal analysis
Many techniques, especially those in machine learning, have been developed for spatial-temporal analysis. These algorithms can be categorized as supervised learning algorithms, which include Support Vector Machine, Time Delay Neural Network ,and Hidden Markov Model and  unsupervised learning algorithms, which include k-means and Gaussian Mixture Model. Supervised learning algorithms for spatial-temporal analysis often involve the use of a set of labeled training data in order to train the model to recognize human actions. In this approach, we feed the spatial-temporal features that are extracted from the video as input and the labels of the videos as output to the supervised learning algorithms such as Support Vector Machine or Time Delay Neural Network in order to train the parameters of the model. After training the model, we can feed features detected from the new video in order to recognize human actions.
 * Supervised learning algorithms


 * Unsupervised learning algorithms

Unlike supervised learning algorithms, unsupervised learning algorithms do not require labeled data for training. Unsupervised learning algorithms learn clusters from unlabeled data and automatically classify different actions for us using techniques such as expectation–maximization algorithm. After training and building a probabilistic model, actions from the new video can be recognized by the unsupervised learning algorithms.

Application in Computer Vision
Spatial temporal patterns are often used in computer vision for automatic categorization and localization of human actions in video sequences. More specifically, the task is to recognize and locate actions from new video sequences by learning models from a collection of videos and the task can be easily extended to build a variety of applications.

For example, in public train station with busy people walking around, we can build vision systems that can automatically detect dangerous actions such as theft. We can build Content-based image retrieval systems that allow us to search for videos based on the content of the video rather than tags or keywords.
 * Detect relevant events in surveillance video
 * Retrieve video from large databases