User:Xcy1990

Behavior Analysis(computer vision) is a research field of computer vision that uses computational resources to achieve the goal of predicting the intention of the people based solely on their visual actions. It is an important and complex artificial intelligence and computation intelligence process used to help solve many understanding problems. It is a wide open field of research with several different methods that related to individual solutions, like identifying hand gestures, understanding different tennis strokes, identifying office activities, and many others.

Introduction
Understanding the situation based on visual perception is the basic ability among most species and humans and could be done by human or animals within mille seconds accurately. Although the interest of behaviour analysis been among the study of computer science since its beginning, a lot of study have come about. But we still cannot match the speed and accuracy of nature system. The approaches to achieve the goal generally have four common tasks: these tasks are performed in different ways depending on the type of classification system.
 * 1) Detect and segment objects in each frame. this involves the area of Object recognition.
 * 2) Determine relative position and orientation of each object
 * 3) Identify a meaningful sequence of frames from the visual input. This is the core part of behaviour analysis, which may involve classification based on different kinds of motions like fighting, walking, building up different models for various motions, probably using vectors quantify the human motion, and comparing their similarities.
 * 4) Store and retrieve past sequence of behaviours for identification of current ones

Traditional artificial intelligent techniques
Some of the transitional artificial and computational intelligence are used for classifying human action, many with a spin towards specific motions. existing approaches includes:
 * Using neural networks to determine human actions. In this method, motions sequence data are extracted from the videos, and are processed in the neural network layer by layer, until an result comes out. For example, Motion data captured input into a neural network specifically designed to determine if the event is static, like standing or sitting, or the event is dynamic, like walking and running. Once the event is determined to be either static or dynamic, another neural network is used on the same data, either a static event neural network or a dynamic event neural network, and the action is classified.
 * Rule based and fuzzy systems.  One of these approaches is to create a prototype fuzzy system for picture understanding of surveillance cameras. The model was split into three parts, pre-processing module, a static object fuzzy system module, and a dynamic temporal fuzzy system module. The static fuzzy system module takes in the pre-processed data and outputs the number of people involved in the scene: a single person, two people, three people, many people, or no people. The dynamic fuzzy system determines the intent of the person, or people, based on their global temporal movements. Although this requires only a basic understanding of human intent by using global movements of people and their interactions based on global positions, it is included in many application research programs

Non-Traditional artificial intelligent techniques
A large amount of research uses visual cues of human actions without any traditional artificial or computational intelligent techniques. These algorithms rely on simplicity at the cost of fusing input data. They often use less than typical data inputs; that is, inputs that would not necessarily be used by human observers. They rely almost exclusively on the pre-processing of the data while using statistical or non-traditional artificial and computational intelligent algorithms to determine the behaviour.

Because different kinds of people do the same action in different ways, even single person can perform the same task in a different fashion. a good representation of similarity is needed. To be specific, a way to represent distance between motion vector that reflect the reality of differences among all kinds of behaviours is required for a successful system. Instead of using traditional Euclidean Distance or Hamming distance for comparing similarity of motion vectors, Many research uses various different kinds of distance measurements for comparing similarity, e.g. modified Chamfer's distance, Mahalanobis distance.

Eigenspace also can be used to help categorize actions from distance computations which identify events. Motion from a camera is captured and manually placed into classes and then finally into a covariance matrix. This makes up the universal image set for that action. Eigenvalues and eigenvectors are calculated and by using the Karhunen–Loève theorem the best ones that describe the action are kept. This makes up an Orthogonal coordinates system. To recognize an unknown image sequence behavior, a distance measure is used for the calculated eigenvalues and eigenvectors onto the coordinate system.

In another approach, manifolds from isomorphic feature mapping (Isomaps) are used to represent individual images of motion sequences. Isomaps are used to reduce the dimensionality of the image but keep most of the features required for classification. Scores for the manifolds are calculated and the curves are either stored (in training) or compared against classified events using Dynamic Time Warping (DTW). Classification is by nearest neighbor.

Using Markov models and Bayesian networks
Developing Markov models or Bayesian networks are common approaches in researches. This research path fits the logical approach of having a sequence of images making up an action. Each sequence image is looked at together with its consecutive image; similar to how a human recognizes actions. There are several ways to develop a network from the input data. one of them is to use a Coupled Hierarchal Durational—State Dynamic Bayesian network (CHDS-DBN) to model human actions. The reason for using it is because to understand human actions, frameworks should have both motion corresponding to the interaction as well as details of the motion on different scales. In order to provide a more efficient way to represent behaviors that are more flexible than other common classification models for large temporal scale input data, one can also apply variable length Markov models in the process.

Using Grammars
In constructing networks, many researchers uses grammars to best describe the sequence of events the body makes when determining the actions of the person visually. This is done by break down the human behaviour into basic elements e.g.waving arms, open mouth etc., then interpret the meaning of motions based on how it's constructed from those elements. for example("waving arms" + "speak" may be interpreted as "greetings") Grammars are mathematical based and seem to fit well with visual action understanding due to their network fashion of solving problems. Probabilistic Context-free grammar(or Stochastic context-free grammar) (PCFG) can be used in short action sequences of a person from video. In one possible implementation of this approach, Body poses are stored as silhouettes which are used in the construction of the PCFG. Pairs of frames are constructed based on their time slot: the body pose from frames 1 and 2 are paired, the body pose from frames 2 and 3 are paired, and so on. These pairs construct the PCFG for the given action. When testing the algorithm, the same procedure is followed. Comparing the testing data with the trained data is accomplished through Bayes' theorem: $$P(s_k|p_i) = P(p_i|s_k)P(s_k)/P(p_i)$$, where $$s_k$$ is the $$k$$th silhouette and $$p_i$$ is the $$i$$th pose.

In other situation, Phrase grammars were used to distinguish the type of action from hand signals that can be networked together to form the meaning of a sentence signed using American Sign Language. In this case, phrase grammars limit the search set of words to improve the accuracy of what is being described. They also speed up the process over not using grammars. A Hidden Markov Model is used to train and test the data.

Using Traditional Hidden Markov Models
Of all the visual human action recognition networks constructed, Hidden Markov Models(HMM) are the most widely used. Hidden Markov Models keep a network of body poses related to each other and provided a way of learning parameters that best fit a set of training data with known classifications.

Using Non-Traditional Hidden Markov Models
Other forms of HMMs have been developed to handle more specific problems associated with HMM based action recognition systems.
 * Parametric Hidden Markov Model(PHMM)has an additional parameter used to represent meaningful variations of gestures across the set of all gestures. This gives PHMMs the ability to distinguish between gesture meanings with similar hand movements.
 * In achieving detecting and classifying interactions between people, Coupled Hidden Markov Model (CHMM) can outperform the standard HMM since the standard HMM works on single automatons where CHMM works on coupled automatons.
 * Multi-Observation Hidden Markov Models(MOHMM) allow for continual changes based on changes in peoples’ movements, thus unsupervised learning could be used to continually update the model.
 * Dynamic Multi-Linked HMM is based on salient dynamic inter-linkages among multiple temporal events using Dynamic Probabilistic Networks(DPN). Standard HMMs cannot take into account the multiple processes needed. The DML-HMM was designed to handle the multitude of different object events. The topology is determined by the causality and temporal order, which was automatically made using the Schwarz Bayesian Information Criterion based factorization. Instead of being fully connected like Coupled HMMs (CHMM), the DHL-HMM aims to only connect a subset of relevant hidden state variables across multiple temporal processes. When comparing between a Multi-Observation HMM (MOHMM), a Parallel HMM (PaHMM), and a CHMM, the DHL-HMM performs better since the CHMMand the MOHMM propagates the noise through the systems and the PaHMM discards correlations between multiple temporal processes.

Other Variants of Hidden Markov Models are Continuous HMMs (cHMMs), Layered Hidden Markov Models (LHMMs) , Observation Decomposed Hidden Markov Model (ODHMM) , Evidence Feed Forward HMM.