User:Vossman/3D Line Regression

Setup variables
This problem seems similar to what simple linear regression does: fit a straight line to a set of data points. However, ordinary linear regression minimizes the sum of the squared deviations between the points and the line, and it defines the deviation as the distance in the vertical (Y) direction. The problem we are going to solve in this example minimizes the direct distance between the points and the line. The direct distance is along a line that runs from the point and is perpendicular to the target line. In the following figure, the distance d is the direct distance from the point at $$(x_0,y_0,z_0)$$ to the line.

The parametric equation for a 3D line is:


 * $$x = \; x_0 + v_x*t$$
 * $$y = \; y_0 + v_y*t$$
 * $$z = \; z_0 + v_z*t$$

Where $$(x_0,y_0,z_0)$$ is some point on the line and $$(v_x,v_y,v_z)$$ is a vector defining the direction of the line. t is the parameter whose value is varied to define points on the line.

With this definition, there are six parameters: $$x_0,y_0,z_0,v_x,v_y,v_z$$. But this overspecifies the line because a 3D line can be defined by 4 parameters as long as it is not parallel to one of the X, Y or Z planes.

When fitting a function to data, it is important that there are no mutually dependent (redundant) parameters in the function. If there are mutually dependent parameters, then there is no unique solution, and the fitting process will not converge.

So we need to eliminate two parameters.

Removing a parameter from the point on the line
Rather than allowing an arbitrary point $$(x_0,y_0,z_0)$$ to specify a point on the line, we will force $$z_0$$ to be 0 and make $$x_0$$ and $$y_0$$ be the coordinates on the X-Y plane where the line penetrates the plane (i.e., where Z is zero). This eliminates $$z_0$$ as a parameter that needs to be computed. We can do this as long as we know that the line is not parallel to the X-Y plane, so it intersects it at some point.

$$(x_0, \; y_0, \; z_0) = (x_0, \; y_0, \; 0)$$

Removing a parameter from the direction vector
Next, we will work on the direction vector $$(v_x,v_y,v_z)$$ that defines the direction of the line. Scaling the direction vector by a non-zero factor changes its length but not its direction (e.g., the direction defined by the vector <1,2,3> is the same as <2,4,6>, but the second vector is twice as long). If we scale the direction vector by $$1/v_z$$ to force $$v_z$$ to be 1, then we can define a revised direction vector,

$$(v_x', \; v_y', \; v_z') = (v_x/v_z, \; v_y/v_z, \; 1)$$

So we will force $$v_z$$ to be 1 and define $$v_x$$ and $$v_y$$ as multiples of $$v_z$$.

This eliminates $$v_z$$ as a parameter that needs to be computed. Note that this is only valid if $$v_z$$ is not zero which means the line is not parallel to the X-Y plane. You can divide by $$v_x$$ or $$v_y$$ if you want to allow the line to be parallel to the X-Y plane but not some other plane.

Use cross-products


\left| \begin{matrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ a_x & a_y & a_z \\ v_x & v_y & v_z \end{matrix} \right|

=

\left[ a_y v_z - a_z v_y \right] \mathbf{i} + \left[ a_z v_x - a_x v_z \right] \mathbf{j} + \left[ a_x v_y - a_y v_x \right] \mathbf{k}

$$



\left| \begin{array}{ccc} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ x-x_i & y-y_i & z-z_i \\ v_x' & v_y' & v_z' \end{array} \right|

=

\left[(y-y_i)*v_z' - (z-z_i)*v_y'\right] \mathbf{i} + \left[(z-z_i)*v_x' - (x-x_i)*v_z'\right] \mathbf{j} + \left[(x-x_i)*v_y' - (y-y_i)*v_x'\right] \mathbf{k}

$$

In addition,


 * $$ \left| v' \right| = 1 $$

so


 * $$ v_x'^2 + v_y'^2 + v_z'^2 = 1 $$

For our least squares minimization, we want to minimize the square of the cross product in each dimension


 * $$ A^2: \left[(y-y_i)*v_z' - (z-z_i)*v_y'\right]^2 = min $$
 * $$ B^2: \left[(z-z_i)*v_x' - (x-x_i)*v_z'\right]^2 = min $$
 * $$ C^2: \left[(x-x_i)*v_y' - (y-y_i)*v_x'\right]^2 = min $$

Then for each variable: $$x, y, z, v_x', v_y',$$ and $$v_z'$$, we take the derivative:


 * $$ \frac{dA^2}{dv_z'} = (y-y_i)*A = 0 $$
 * $$ \frac{dA^2}{dv_y'} = (z-z_i)*A = 0 $$

Bad Optimization
So, now we have 4 variables: $$x_0, y_0, v_x', v_y'$$ to optimize from our list of points.


 * $$\sum_i \left[ (v_x'*t + x_0 - x_i)^2 + (v_y'*t + y_0 - y_i)^2 + (v_z'*t + z_0 - z_i)^2 \right] $$


 * $$ = \sum_i \left[ (v_x'*t + x_0 - x_i)^2 + (v_y'*t + y_0 - y_i)^2 + (t - z_i)^2 \right] $$

from this it is obvious that $$t \approx z_i$$ for each point, therefore:


 * $$\Longrightarrow \sum_i \left[ (v_x'*z_i + x_0 - x_i)^2 + (v_y'*z_i + y_0 - y_i)^2 \right] $$

we want to minimize this equation with respect to our four variables $$x_0, y_0, v_x', v_y' \; $$, so we can take the derivative with respect to each which in turn generates four equations for our four unknowns:


 * $$\sum_i z_i*(v_x'*z_i + x_0 - x_i) \;= 0 $$


 * $$\sum_i -1*(v_x'*z_i + x_0 - x_i) \;= 0 $$


 * $$\sum_i z_i*(v_y'*z_i + y_0 - y_i) \;= 0$$


 * $$\sum_i -1*(v_y'*z_i + y_0 - y_i) \;= 0$$

simplifying:


 * $$\sum_i v_x'*z_i^2 + x_0*z_i \;= x_i*z_i $$


 * $$\sum_i v_x'*z_i + x_0 \;= x_i$$


 * $$\sum_i v_y'*z_i^2 + y_0*z_i \;= y_i*z_i$$


 * $$\sum_i v_y'*z_i + y_0 \;= y_i$$

rearranging two of the equations:


 * $$\sum_i x_0 \;= x_i - v_x'*z_i$$


 * $$\sum_i y_0 \;= y_i - v_y'*z_i$$

therefore,


 * $$x_0 \;= \frac{\sum_i x_i - v_x'*z_i}{N}$$


 * $$y_0 \;= \frac{\sum_i y_i - v_y'*z_i}{N}$$

where N is the number of points

Try again
Distance from line to point, $$(x_i, y_i, z_i) \;$$:


 * $$l = \sqrt{v_x'^2 + v_y'^2 + v_z'^2} \;$$
 * $$t*l = v_x'*(x_i-x_0) + v_y'*(y_i-y_0) + v_z'*(z_i-z_0)\;$$