User:Anuran/3D Projection

This is a copy of the old 2005 3D projection article which was so helpful.

A 3D projection is a mathematical transformation used to project three dimensional points onto a two dimensional plane. Often this is done to simulate the relationship of the camera to subject. 3D projection is often the first step in the process of representing three dimensional shapes two dimensionally in computer graphics, a process known as rendering.

The following algorithm was a standard on early computer simulations and videogames, and it is still in use with heavy modifications for each particular case. This article describes the simple, general case.

Data necessary for projection
Data about the objects to render is usually stored as a collection of points, linked together in triangles. Each point is a series of three numbers, representing its X,Y,Z coordinates from an origin relative to the object they belong to. Each triangle is a series of three points or three indexes to points. In addition, the object has three coordinates X,Y,Z and some kind of rotation, for example, three angles alpha, beta and gamma, describing its position and orientation relative to a "world" reference frame.

Last comes the observer (the term camera is the one commonly used). The camera has a second set of three X,Y,Z coordinates and three alpha, beta and gamma angles, describing the observer's position and the direction along which it is.

All this data is usually stored in floating point, even if many programs convert it to integers at various points in the algorithm, to speed up the calculations.

Mathematical tools
The 3D transformation makes heavy use of square matrices, with 4x4 dimensions, and trigonometric functions. Each step of the algorithm is a matrix multiplication, where the elements of the matrices are derived from the coordinates and angles listed above, and various combinations of sines and cosines. Matrices have 4 rows and 4 columns, and use homogeneous coordinates, where vectors of three elements are typically extended to four adding a "1" element at their end.

Given a point of the form {x, y, z, 1}, one will apply a transformation resulting in a point of the form {x ' , y ' , z ' , &omega; ' }. The projected point on the screen is then at the 2D coordinates {x ' /&omega; ' , y ' /&omega; ' }. The 1D coordinate z ' /&omega; '  is needed to see if the projected point is in front of the camera or behind it. The number &omega; '  (in addition to the screen coordinates) is needed when drawing textured triangles, but not when drawing monochromatic triangles.

Thanks to the associativity property of matrix multiplication, a program can pre-calculate many matrices, for example if it is known that some coordinate will never change.

Sometimes, a final "transformation matrix" valid for all points can be calculated, and then applied. This saves considerable time, since applying a matrix to a point uses only up to sixteen multiplications, instead of the dozens necessary to multiply matrices together.

At the very least, a transformation matrix can be calculated for a single object and then applied to all points in that object.

Note: The matrices are multiplied in the order Last matrix&times;...&times;Second matrix&times;First matrix&times;Point

First step: world transform
The first step is to transform the points coordinates taking into account the position and orientation of the object they belong to. This is done using a set of four matrices:



\begin{bmatrix} 1 & 0 & 0 & x \\ 0 & 1 & 0 & y \\ 0 & 0 & 1 & z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; object translation



\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \alpha & -\sin \alpha & 0 \\ 0 & \sin \alpha & \cos \alpha & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; rotation about the x-axis



\begin{bmatrix} \cos \beta & 0 & \sin \beta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin \beta & 0 & \cos \beta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; rotation about the y-axis



\begin{bmatrix} \cos \gamma & -\sin \gamma & 0 & 0 \\ \sin \gamma & \cos \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; rotation about the z-axis.

The four matrices are multiplied together, and the result is the world transform matrix: a matrix that, if a point's coordinates were multiplied by it, would result in the point's coordinates being expressed in the "world" reference frame.

Note that, unlike multiplication between numbers, the order used to multiply the matrices is significant: changing the order will change the results too. When dealing with the three rotation matrices, a fixed order is good for the necessity of the moment that must be chosen. The object should be rotated before it is translated, since otherwise the position of the object in the world would get rotated around the centre of the world, wherever that happens to be.

World transform = Translation &times; Rotation

To complete the transform in the most general way possible, another matrix called the scaling matrix is used to scale the model along the axes. This matrix is multiplied to the four given above to yield the complete world transform. The form of this matrix is:



\begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; where sx, sy, and sz are the scaling factors along the three co-ordinate axes.

Since it is usually convenient to scale the model in its own model space or co-ordinate system, scaling should be the first transformation applied. The final transform thus becomes:

World transform = Translation &times; Rotation &times; Scaling



\begin{bmatrix} s_x\cos \gamma \cos \beta & -s_y\sin \gamma \cos \beta & s_z\sin \beta & x \\ s_x\cos \gamma \sin \beta \sin \alpha + s_x\sin \gamma \cos \alpha & s_y\cos \gamma \cos \alpha - s_y\sin \gamma \sin \beta \sin \alpha & -s_z\cos \beta \sin \alpha & y \\ s_x\sin \gamma \sin \alpha - s_x\cos \gamma \sin \beta \cos \alpha & s_y\sin \gamma \sin \beta \cos \alpha + s_y\sin \alpha \cos \gamma & s_z\cos \beta \cos \alpha & z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; final result of Translation &times; x &times; y &times; z &times; Scaling.

Second step: camera transform
The second step is virtually identical to the first one, except for the fact that it uses the six coordinates of the observer instead of the object, and the inverses of the matrixes should be used, and they should be multiplied in the opposite order. (Note that (A&times;B)-1=B-1&times;A-1.) The resulting matrix can transform coordinates from the world reference frame to the observer's one.

The camera looks in its z direction, the x direction is typically left, and the y direction is typically up.



\begin{bmatrix} 1 & 0 & 0 &-x \\ 0 & 1 & 0 &-y \\ 0 & 0 & 1 &-z \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; inverse object translation (the inverse of a translation is a translation in the opposite direction).



\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \alpha & \sin \alpha & 0 \\ 0 & -\sin \alpha & \cos \alpha & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; inverse rotation about the x-axis (the inverse of a rotation is a rotation in the opposite direction. Note that sin(&minus;x) = &minus;sin(x), and cos(&minus;x) = cos(x)).



\begin{bmatrix} \cos \beta & 0 & -\sin \beta & 0 \\ 0 & 1 & 0 & 0 \\ \sin \beta & 0 & \cos \beta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash; inverse rotation about the y-axis.



\begin{bmatrix} \cos \gamma & \sin \gamma & 0 & 0 \\ -\sin \gamma & \cos \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ &mdash;  inverse rotation about the z-axis.

The two matrices obtained from the first two steps can be multiplied together to get a matrix capable of transforming a point's coordinates from the object's reference frame to the observer's reference frame.


 * Camera transform = inverse rotation &times; inverse translation


 * Transform so far = camera transform &times; world transform.

Third step: perspective transform
The resulting coordinates would be already good for an isometric projection or something similar, but realistic rendering requires an additional step to correctly simulate the perspective distortion. Indeed, this simulated perspective is the main aid for the viewer to judge distances in the simulated view.

A perspective distortion can be generated using the following 4&times;4 matrix:



\begin{bmatrix} 1/\tan\mu & 0 & 0 & 0 \\ 0 & 1/\tan\nu & 0 & 0 \\ 0 & 0 & \frac{B+F}{B-F} & \frac{-2BF}{B-F} \\ 0 & 0 & 1 & 0 \end{bmatrix} $$

where &mu; is the angle between a line pointing out of the camera in z direction and the plane through the camera and the right-hand edge of the screen, and &nu; is the angle between the same line and the plane through the camera and the top edge of the screen. This projection should look correct, if you are looking with one eye; your actual physical eye is located on the line through the centre of the screen normal to the screen, and &mu; and &nu; are physically measured assuming your eye is the camera. On typical computer screens as of 2003, tan &mu; is probably about 11/3 times tan &nu;, and tan &mu; might be about 1 to 5, depending on how far from the screen you are.

F is a positive number representing the distance of the observer from the front clipping plane, which is the closest any object can be to the camera. B is a positive number representing the distance to the back clipping plane, the farthest away any object can be. If objects can be at an unlimited distance from the camera, B can be infinite, in which case (B + F)/(B &minus; F) = 1 and &minus;2BF/(B &minus; F) = &minus;2F.

If you are not using a Z-buffer and all objects are in front of the camera, you can just use 0 instead of (B + F)/(B &minus; F) and &minus;2BF/(B &minus; F). (Or anything you want.)

All the calculated matrices can be multiplied together to get a final transformation matrix. One can multiply each of the points (represented as a vector of three coordinates) by this matrix, and directly obtain the screen coordinate at which the point must be drawn. The vector must be extended to four dimensions using homogeneous coordinates:


 * $$\begin{bmatrix}

x' \\ y' \\ z' \\ \omega' \\ \end{bmatrix}=\begin{bmatrix}{\rm Perspective\ transform}\end{bmatrix} \times \begin{bmatrix}{\rm Camera\ transform}\end{bmatrix} \times \begin{bmatrix}{\rm World\ transform}\end{bmatrix} \times \begin{bmatrix} x \\ y \\ z \\ 1 \\ \end{bmatrix}. $$

Note that in computer graphics libraries, such as OpenGL, you should give the matrices in the opposite order as they should be applied, that is, first the perspective transform, then the camera transform, then the object transform, as the graphics library applies the transformations in the opposite order than you give the transformations in! This is useful, since the world transform typically changes more often than the camera transform, and the camera transform changes more often than the perspective transform. One can, for example, pop the world transform off a stack of transforms and multiply a new world transform on, without having to do anything with the camera transform and perspective transform.

Remember that {x ' /&omega; ' , y ' /&omega; ' } is the final coordinates, where {&minus;1, &minus;1} is typically the bottom left corner of the screen, {1, 1} is the top right corner of the screen, {1, &minus;1} is the bottom right corner of the screen and {&minus;1, 1} is the top left corner of the screen.

If the resulting image may turn out upside down, swap the top and bottom.

If using a Z-buffer, a z ' /&omega; '  value of &minus;1 corresponds to the front of the Z-buffer, and a value of 1 corresponds to the back of the Z-buffer. If the front clipping plane is too close, a finite precision Z-buffer will be more inaccurate. The same applies to the back clipping plane, but to a significantly lesser degree; a Z-buffer works correctly with the back clipping plane at an infinite distance, but not with the front clipping plane at 0 distance.

Objects should only be drawn where &minus;1 &le; z ' /&omega; '  &le; 1. If it is less than &minus;1, the object is in front of the front clipping plane. If it is more than 1, the object is behind the back clipping plane. To draw a simple single-colour triangle, {x ' /&omega; ' , y ' /&omega; ' } for the three corners contains sufficient information. To draw a textured triangle, where one of the corners of the triangle is behind the camera, all the coordinates {x ' , y ' , z ' , &omega; ' } for all three points are needed, otherwise the texture would not have the correct perspective, and the point behind the camera would not appear in the correct location. In fact, the projection of a triangle where a point is behind the camera is not technically a triangle, since the area is infinite and two of the angles sum to more than 180&deg;, the third angle being effectively negative. (Typical modern graphics libraries use all four coordinates, and can correctly draw "triangles" with some points behind the camera.) Also, if a point is on the plane through the camera normal to the camera direction, &omega; '  is 0, and {x ' /&omega; ' , y ' /&omega; ' } is meaningless.

Simple Version
2D.X = 3D.X - ((DX / (3D.Z + EyeDistance)) * 3D.X)

2D.Y = 3D.Y - ((DY / (3D.Z + EyeDistance)) * 3D.Y)

where DX/DY is the distance between the Eye and the 3D Point in the X/Y axis and a large positive Z is towards the horizon and 0 is screen.