Perspective Transforms
By Andre Yew (andrey@gluttony.ugcs.caltech.edu)
This is how I learned perspective transforms --- it was intuitive and understandable to me, so perhaps it'll be to others as well. It does require knowledge of matrix math and homogeneous coordinates. IMO, if you want to write a serious renderer, you need to know both.
First, let's look at what we're trying to do:
S (screen)
| * P (y, z)
| /|
| / |
| / |
|/ |
* R |
/ | |
/ | |
/ | |
E (eye)/ | | W
---------*-----|----*-------------
<- d -><-z->
E is the eye, P is the point we're trying to project, and R is its projected position on the screen S (this is the point you want to draw on your monitor). Z goes into the monitor (left- handed coordinates), with X and Y being the width and height of the screen. So let's find where R is:
R = (xs, ys)
Using similar triangles (ERS and EPW)
xs/d = x/(z + d)
ys/d = y/(z + d)
(Use similar triangles to determine this)
So,
xs = x*d/(z + d)
ys = y*d/(z + d)
Express this homogeneously:
R = (xs, ys, zs, ws)
Make
xs = x*d
ys = y*d
zs = 0 (the screen is a flat plane)
ws = z + d
and express this as a vector transformed by a matrix:
[x y z 1][ d 0 0 0 ]
[ 0 d 0 0 ] = R
[ 0 0 0 1 ]
[ 0 0 0 d ]
The matrix on the right side can be called a perspective transform. But we aren't done yet. See the zero in the 3rd column, 3rd row of the matrix? Make it a 1 so we retain the z value (perhaps for some kind of Z-buffer). Also, this isn't exactly what we want since we'd also like to have the eye at the origin and we'd like to specify some kind of field-of-view. So, let's translate the matrix (we'll call it M) by -d to move the eye to the origin:
[ 1 0 0 0 ][ d 0 0 0 ]
[ 0 1 0 0 ][ 0 d 0 0 ]
[ 0 0 1 0 ][ 0 0 1 1 ] <--- Remember, we put a 1 in (3,3) to
[ 0 0 -d 1 ][ 0 0 0 d ] retain the z part of the vector.
And we get:
[ d 0 0 0 ]
[ 0 d 0 0 ]
[ 0 0 1 1 ]
[ 0 0 -d 0 ]
Now parametrize d by the angle PEW, which is half the field-of-view (FOV/2). So we now want to pick a d such that ys = 1 always and we get a nice relationship:
d = cot( FOV/2 )
Or, to put it another way, using this formula, ys = 1 always.
Replace all the d's in the last perspective matrix and multiply through by sin's:
[ cos 0 0 0 ]
[ 0 cos 0 0 ]
[ 0 0 sin sin ]
[ 0 0 -cos 0 ]
With all the trig functions taking FOV/2 as their arguments. Let's refine this a little further and add near and far Z-clipping planes. Look at the lower right 2x2 matrix:
[ sin sin ]
[-cos 0 ]
and replace the first column by a and b:
[ a sin ]
[ b 0 ]
[ b 0 ]
Transform out near and far boundaries represented homogeneously as (zn, 1), (zf, 1), respectively and we get:
(zn*a + b, zn*sin) and (zf*a + b, zf*sin)
We want the transformed boundaries to map to 0 and 1, respectively, so divide out the homogeneous parts to get normal coordinates and equate:
(zn*a + b)/(zn*sin) = 0 (near plane)
(zf*a + b)/(zf*sin) = 1 (far plane)
Now solve for a and b and we get:
a = (zf*sin)/(zf - zn)
= sin/(1 - zn/zf)
b = -a*zn
b = -a*zn
At last we have the familiar looking perspective transform matrix:
[ cos( FOV/2 ) 0 0 0 ]
[ 0 cos( FOV/2 ) 0 0 ]
[ 0 0 sin( FOV/2 )/(1 - zn/zf) sin( FOV/2 ) ]
[ 0 0 -a*zn 0 ]
There are some pretty neat properties of the matrix. Perhaps the most interesting is how it transforms objects that go through the camera plane, and how coupled with a clipper set up the right way, it does everything correctly. What's interesting about this is how it warps space into something called Moebius space, which is kind of like a fortune-cookie except the folds pass through each other to connect the lower folds --- you really have to see it to understand it. Try feeding it some vectors that go off to infinity in various directions (ws = 0) and see where they come out.