User:JinOuKim/sandbox/Computational Vision

A Computational Vision relates with modeling and replicating human vision using computer hardware and software. The computational vision also refers to the studies how to reconstruct, interpret and understand a 3-Dimensional scene from its 2-Dimensional images in terms of the properties of the structures present in the scene. It merges knowledge in computer science, electrical engineering, mathematics, physiology, biology, and cognitive science. The goal of computational vision is to make possible systems that can consistently interpret or express the visual environment under almost any operating conditions, which is to reproduce the sufficient performance of human visual perception. The future computational vision models will be developed in physics and in computer graphics. Both of these fields model how objects move and animate, how light reflects off their surfaces, is scattered by the atmosphere, refracted through camera lenses, and projected onto a flat image plane.

History

 * 1970s - High-level shape models(gc's, superquads, geons, volumetric abstractions), idealized images, simple no-text objects, blocks world-like scenes. Salient contours map to surface discontinuities
 * 1980s - Mid-level shape models(polyhedra, CAD models, low-level geometric invariants, 3-Dimensional or view based 2-Dimensional geometric templates), more complex textureless objects, well-defined geometric structure. Salient contours map to polyhedral edges, image corners to polyhedral vertices.
 * 1990s - Low-level image-based appearance models(pixel-base templates, eigenspaces), most complex objects, full texture, restricted scenes, pixels in image correspond to pixels in model.
 * 2000s - Appearance-based abstractions of local nighborhoods(SIFT, affine-invariant patches, phase-based patches, shape contexts), most complex objects, robustness to noise, occlusion, articulation, minor within-class variation, appearance of image still very close to appearance of model.

Object Recognition
The issues for object recognition have risen from generality and number of objects. For generality, the machine has not been able to easily detect whether the object is in 2-dimensional or 3-dimensional. Also, range of viewing conditions and segmentation or categorization of biological parts are not well-defined. Indeed, the machine sometimes do not count the correct number of objects.

Edge Detection
Edge detection is important since the success of higher level processing relies on good edges. Gray level images have large amount of data, which are irrelevant. Therefore, the initial step is to reduce some of the data; the object is separated from the background and edges which are physically significant are identified. The edge information plays key role in the selection of tokens. In motion, the detection of moving objects is processed by identifying the time varying edges and corners. Object recognition methods based on 2-Dimensional shape use the edge detection method. There are three stages in edge detection.

Filtering
During the filtering stage, the image passes through a filter to remove the sound. The sound can be due to the undesirable effects introduced in sampling, quantization, blurring and defocusing of the camera, and irregularities of the surface structure of the objects. The simplest filter is the mean filter. In the mean filter, the gray level at each pixel is replaced by the average of gray levels in a small neighborhood around the pixel. With this process, the sound is averaged out.

Differentiation
The differentiation stage focuses the locations in the image where intensity changes are significant. When the filtering stage is skipped, the differentiation step was performed by using the finite-difference approach, and the detection was processed by locating the highest points in the gradient of the intensity function using a threshold. In these approaches, filtering is not important because only synthetic images and the industrial scenes with a controlled environment were considered. Since taking the mean is a step for filtering, better data can be collected. Determination of the threshold depends on the domain of the application and it varies in different images. Still, autonomic selection of thresholding is not an easy problem.

Detection
During the detection stage, those points where the intensity changes are significant are localized. In the detection stage, the computer detects the highest points in the derivative output to locate the edge points.

Motion Detection
The optical flow estimation relies on the local information which is a part of the image. The optical flow can be utilized to compute 3-Dimensional model such as translation and rotation, and 3-Dimensional shape. The extraction process, in a given pixel, of a motion information is conditioned by the existence of a non-zero spatial gradient. Therefore, the explicit information is available only inside of a non-homogeneous of image. Indeed, the clear information is incomplete. It only provides a partial image of the underlying motion, depending on the spatial gradient orientation at the pixel level.

Schnuck Method
The Schunuck method is a simpler method for computing optical flow. In this method, multiple components are used to compute the flow. Because the gray level at a single pixel gives only one constraint, the optical flow can lie anywhere on the straight line figured by the spatial and temporal derivatives. In the second constraint from the neighboring pixel is used, then the right optical flow can be determined by computing the intersection of two lines represented by the constraints. In general, it is idealized to employ multiple constraints and Schnuck's Method utilizes eight constraints obtained from points around 3 X 3 neighborhood and this results in 8 intersections of lines. If the measurements does not contain the noise, and all pixels in a 3 X 3 neighborhood belong to the same moving object, then, in principle, all eight straight lines will intersect at a single point, which is in the correct form of optical flow.

Structure from Motion
The structure from motion (SFM) method in computational vision refers the physical properties of the objects present in the scene, such as their 3-Dimensional structure and motion, given a series of two-dimensional projections. There are two classes of methods for SFM, which are displacement methods and instantaneous methods

displacement methods
3-Dimensional coordinates from points on the moving objects and their three dimensional motion is recovered from a sequence of frames.

instantaneous methods
In the process of instantaneous method, the optical flow is utilize to recover the 3-Dimensional motion and depth values.

Activity Recognition
Most of studies in activity recognition in computer vision focus on sub-problems such as tracking and motion detection and often misses a second layer that spans a variety of perceptual component and fuses interprets their outputs. Therefore it is hard to develop a framework for high-level human activity recognition.

Deep Hierarchy
Deep Hierarchy System follows through a systematic process and its process continues stage by stage when visualizing the image. Deep Hierarchy System is beneficial in computational efficiency and generalization. Computer Vision Hierarchy is divided into three levels of vision, which are low, intermediate, and high. Deep hierarchies build on top of each other with exploiting the sharing ability of features among more complex compositions. Sharing ability refers to the share of common computations, which brings out computational efficiency. However, recycling commonalities between objects' models places their representations in relation to other objects, thus leading to high generalization capabilities and lower storage demands. Also, although all neuro-physiologic evidence suggests that in the human visual system number of levels are realized, it has been known that the design and learning of deep hierarchical systems is very difficult task.

Low-level vision
Low-level vision processes image for feature extraction such as edge, corner, and optical flow. The present implementation of the low-level process module is a logical and conventional, by mean of the virtual image system, which is used as a tool for investigating low-level visual processes. Visual Image System is currently implemented on a Unix workstation under X-Windows. The computation efficiency of the low-level process tasks can be greatly improved if a computational model, more adequate to the data structure and to the operation that must be done on it, is adopted.

Intermediate-level vision
Intermediate-level vision recognizes the object and interprets 3-Dimensional scene using features obtained from low-level vision.

High-level vision
High-level vision interprets the evolving information provided by the intermediate level vision as well as directing what intermediate and low level vision as well as directing what intermediate and low level vision tasks should be performed. Also, it may include conceptual description of a scene like activity, intention and behavior. At this level, the volumetric representation constitutes the history of the visual process. It contains the information regarding the 3-Dimensional structure of the scene as achieved by the peripheral processing stages, and then modified by perceptive and conceptual reasoning. The segmentation by regions and by edges further characterizes some parts of the volumetric representation.

Flat Processing Scheme
Flat Processing Scheme a simple feature based descriptors were taken as input and processed by the task-dependent algorithms.

Fields

 * Image processing - focuses on image manipulation to enhance image quality, to restore an image or to compress/decompress an image.
 * Pattern recognition - studies various techniques such as statistical techniques, neural network, and support vector machines to recognize or classify different patterns. Pattern recognition techniques are widely used in computer vision.
 * Photogrammetry - is concerned with obtaining accurate and reliable measurements from images. If focuses on accurate mensuration. Camera calibration and 3-Dimensional reconstruction are two areas of interest to both computer vision and photogrammetry researchers.

Application

 * Robotics - Localization-determine robot location automatically, obstacles avoidance, Navigation and visual serving, Assembly(peg-in-hole,welding, painting), Manipulation (PUMA robot manipulator), Human Robot Interaction (HRI): Intelligent robotics to interact with and serve people
 * Medicine - Classification and detection (lesion or cells classification and tumor detection), 2,3-Dimensional segmentation, 3-Dimensional human organ reconstruction (MRI/ultrasound), Vision-guided robotics surgery
 * Security - Biometrics(iris, finger print, face recognition), Surveillence-detecting certain suspicious activities or behaviors
 * Transportation - Autonomous vehicle, Safety(driver vigilance monitoring)
 * Industrial automation - Industrial inspection (defect detection and mensuration), Assembly, Barcode and package label reading, Object sorting, Document understanding (OCR)
 * Image/video databases
 * Human Computer Interface - Gaza estimation, Face expression recognition, Head and hand gesture recognition

Reference
Category:Computational Vision Category:Neurology Category:Neuroscience Category:Unsolved problems in neuroscience