User:S1151960

Voxel colouring is a computer vision technique for volumetric scene reconstruction from a set of input images taken from a wide range of viewpoints. Steven M. Seitz and Charles R. Dyer first introduced voxel colouring in 1997.

As opposed to feature- and contour-based techniques that focus on reconstructing the shapes of the objects in the scene, voxel colouring approaches the problem as that of colour reconstruction. The core idea behind this technique is to find colour invariant voxels that constitute a complete scene with regards to the set of input images.

The ordinal visibility constraint that is a prerequisite for voxel colouring provides an inherent solution for the problem of occlusion in input images. This also makes it possible to implement voxel colouring as a one-pass algorithm.

Main concepts
Voxel colouring assumes that the scene to be reconstructed is approximately Lambertian.

Ordinal visibility constraint
In the context of computer vision, the effect of a certain scene point not being visible in a given image, because it is obstructed by another point in the scene, is called occlusion. Most of the scene reconstruction techniques are inefficient in handling significant occlusions in the input images.

Voxel colouring technique solves the problem of occlusion by introducing the ordinal visibility constraint, which is defined by Seitz & Dyer as follows:


 * There exists a norm $$\|\boldsymbol{\cdot}\|$$ such that for all scene points $$P$$ and $$Q$$, and input images $$I$$, $$P$$ occludes $$Q$$ in $$I$$ only if $$\|P\|<\|Q\|$$

This constraint limits the type of camera configurations that can be used to take input images. In general, the ordinal visibility constraint is satisfied if none of the scene points are contained within the convex hull formed by the camera positions (also called camera volume). In such case, the norm of the ordinal visibility constraint can be specified as the distance from the given scene point to the camera volume.

For instance, a set of inward-facing cameras positioned on a circle and capturing a scene inside that circle would be incompatible with the ordinal visibility constraint, because, in this case the circle comprises the camera volume, which also includes the scene points. However, moving the scene either below or above the camera volume (and rotating the cameras downwards or upwards, respectively) would create the compatible configuration.

The ordinal visibility constraint makes it possible to partition the volume space into the layers of uniform distance from the camera volume. The definition of the constraint ensures that the voxels on each layer can only be occluded by the voxels from the previous layers, which are closer to the camera volume. This property is the key for voxel colouring technique.

Voxel consistency
For the purposes of voxel colouring, the input image $$I$$ is described as a set of infinitesimally small pixels $$p$$: $$I=\{p\}$$. Similarly, the scene $$S$$ that corresponds to the set of input images is described as a set of voxels (three-dimensional analogues of a pixel) $$V$$: $$S=\{V\}$$. The colour of a pixel $$p \in I$$ is denoted by $$colour(p,\ I)$$ and the color information of a voxel $$V \in S$$ is denoted by $$colour(V,\ S)$$.

The voxel $$V \in S$$ that is visible (not occluded) in image $$I$$ and is projected to the pixel $$p \in I$$ is given by $$V = S(p)$$.

If for every voxel $$V \in S$$ and image pixels $$p_{I_i} \in I_i$$ and $$p_{I_j} \in I_j\ (i, j = 1..N)$$,


 * $$S(p_{I_i}) = S(p_{I_j}) = V \implies colour(S(p_{I_i}), I_i) = colour(S(p_{I_j}), I_j) = colour(V, S)$$,

then the set $$S$$ is said to be voxel-consistent with the set of images $$I_1,\dotsc, I_N$$. In other words, if the set $$S$$ is voxel-consistent, then all image projections of each voxel $$V \in S$$ have the same colour as the voxel itself.

Complete and consistent scenes
If for every image $$I_i\ (i=1..N)$$ and every pixel $$p_{I_i} \in I_i$$ exists voxel $$V \in S$$ such that $$V = S(p_{I_i})$$, then the scene $$S$$ is said to be complete with respect to the set of images $$I_i\ (i = 1..N)$$.

The scene $$S$$ that is both complete and voxel-consistent with regards to the set of images $$I_i\ (i = 1..N)$$ is said to be consistent with that set of images. If we denote by $$S_V$$ the set of all voxels from the given scene $$S$$ that are closer to the camera volume than the given voxel $$V$$, scene consistency can be defined more formally as follows:


 * Suppose $$S$$ is complete and, for each point $$V \in S,\ \{V\} \bigcup S_V$$ is voxel-consistent. Then $$S$$ is a consistent scene.

The inverse statement is also true:


 * If $$S$$ is a consistent scene then $$\{V\} \bigcup S_V$$ is a voxel-consistent set for every $$V \in S$$.

Colour invariance
If for a given voxel $$V$$ and any scenes $$S_1$$ and $$S_2$$ that are consistent with the set of images $$I_i\ (i=1..N),\ V \in S_1 \bigcap S_2$$ implies that $$colour(V,\ S_1) = colour(V,\ S_2)$$, then $$V$$ is said to be a colour invariant with regards to the set of image $$I_i\ (i=1..N)$$.

Let’s introduce the following notation:


 * $$V_p = \{S(p)\ |\ \hat{S} - consistent, S \in \hat{S},\ \|S(p)\| = \min_{\hat{S}} \|\hat{S}(p)\| \}$$ for some $$p_{I_i} \in I_i, i=1..N,$$

i.e. $$V_p$$ is the voxel, belonging to the scene(s) consistent with the set of input images, that projects to the pixel $$p_{I_i}$$ in the given image $$I_i$$ and is the closest to the camera volume compared to all the other voxels satisfying the previous criteria.

It is easy to notice that $$V_p$$ is a colour invariant, since $$V_p \in S$$ implies that $$V_p = S(p)$$, by definition. As scene consistency requires voxel-consistency, $$V_p$$ will have the same colour in all of its image projections $$p_{I_i}, i=1..N$$. Since image projections remain the same independent of the scene $$S$$, voxel $$V_p$$ will have the same colour across all consistent scenes it belongs to. This conforms to the definition of the colour invariant.

Finally, if we denote by $$\bar{S}$$ the set of all $$V_p$$, i.e. the closest colour invariants corresponding to each of the pixels $$p_{I_i} \in I_i, i=1..N,$$ we will get a complete scene (contains voxels for each of the pixels in input images) that is consistent with the set of input images (each voxel belongs to at least one consistent scene, i.e. each voxel has the same colour as all of its projections). $$\bar{S}$$ constitutes the voxel colouring of the input images.

Voxel colouring algorithm
In general, the voxel colouring algorithm consists of the following three steps:


 * Step 1: Partition the 3D space into voxel layers of uniform distance from the camera volume (see Ordinal visibility constraint section).

If $$\nu_d$$ is the set of voxels that are located at distance $$d$$ from the camera volume and $$\nu$$ is the set of all voxels, the idea of partitioning the space can be formalized as follows:
 * $$\nu_d = \{V\ |\ \|V\| = d\},$$


 * $$\nu = \bigcup \limits_{i = 1}^M \nu_{d_i},$$ where $$d_1,\dotsc, d_M$$ is an increasing sequence of numbers.


 * Step 2: Find the colour invariant voxels in each layer.

In the simplest case this can be achieved by iterating through all voxels in the layer, projecting them to each of the images to identify their footprint (the set of all pixels included in the image projections of a voxel) and performing a voxel consistency test.

The projection here corresponds to "the intersection with the image plane of all rays from the camera center intersecting the voxel".

Without noise or quantization effects, a consistent voxel should project to a set of pixels with equal color values. In the presence of these effects, the correlation of the pixel colors $$\lambda_V$$ is evaluated to measure the likelihood of voxel consistency. While there are different heuristics for choosing the correlation function, in the simplest case the standard deviation of the pixel colour values in the visible (unoccluded) voxel projections can be used as $$\lambda_V$$ and thresholded by the maximum allowable error in the colour space (selected heuristically).


 * Step 3: Mark all image pixels corresponding to the detected colour invariant voxel, if they have not been marked previously.

Marking of the pixels is necessary to account for occlusions. All pixels in input images are initially unmarked. We will denote the set of unmarked pixels from the footprint of voxel $$V$$ in image $$I_i$$ with $$\pi_i,$$ and if the voxel passes the consistency test, the pixels from $$\bigcup \limits_{i = 1}^N \pi_i$$ will be marked as corresponding to this voxel. If a part of the voxel’s footprint has already been marked, it means that the voxel is occluded in that image by another voxel(s), which has already been tested, because it is closer to the camera volume and therefore belongs to one of the previous layers (this is ensured by the ordinal visibility constraint).

Since occlusions are explicitly accounted for in the partitioning and marking steps, one pass through the voxel layers of increasing depth is enough to obtain the voxel colouring of the input images.

Pseudo-code implementation
Here is the pseudo-code of the one-pass voxel colouring algorithm by Seitz and Dyer :

Since all of the voxels are presumed to be transparent at the beginning and are changed to opaque only after passing the consistency test, Dyer compares it to the process of clay modelling.

Examples
Seitz & Dyer presented the results of applying the voxel colouring technique to the scene reconstruction from both real and synthetic scene images. The famous examples used in the relevant literature are the reconstructions of the dinosaur toy and a rose from 21 input images taken by 360° rotation of each of the objects from the downward facing camera fixed above the object level.

Voxel colouring was shown to be efficient in re-projecting the scene views from the input images, preserving most of the fine features of the scene, as well as in reconstructing scene images from new viewpoints.