O'Reilly logo

iPhone 3D Programming by Philip Rideout

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Photography Metaphor

The assembly line metaphor illustrates how OpenGL works behind the scenes, but a photography metaphor is more useful when thinking about a 3D application’s workflow. When my wife makes an especially elaborate Indian dinner, she often asks me to take a photo of the feast for her personal blog. I usually perform the following actions to achieve this:

  1. Arrange the various dishes on the table.

  2. Arrange one or more light sources.

  3. Position the camera.

  4. Aim the camera toward the food.

  5. Adjust the zoom lens.

  6. Snap the picture.

It turns out that each of these actions have analogues in OpenGL, although they typically occur in a different order. Setting aside the issue of lighting (which we’ll address in a future chapter), an OpenGL program performs the following actions:

  1. Adjust the camera’s field-of-view angle; this is the projection matrix.

  2. Position the camera and aim it in the appropriate direction; this is the view matrix.

  3. For each object:

    1. Scale, rotate, and translate the object; this is the model matrix.

    2. Render the object.

The product of the model and view matrices is known as the model-view matrix. When rendering an object, OpenGL ES 1.1 transforms every vertex first by the model-view matrix and then by the projection matrix. With OpenGL ES 2.0, you can perform any transforms you want, but it’s often useful to follow the same model-view/projection convention, at least in simple scenarios.

Later we’ll go over each of the three transforms (projection, view, model) in detail, but first we need to get some preliminaries out of the way. OpenGL has a unified way of dealing with all transforms, regardless of how they’re used. With ES 1.1, the current transformation state can be configured by loading matrices explicitly, like this:

float projection[16] = { ... };
float modelview[16] = { ... };



With ES 2.0, there is no inherent concept of model-view and projection; in fact, glMatrixMode and glLoadMatrixf do not exist in 2.0. Rather, matrices are loaded into uniform variables that are then consumed by shaders. Uniforms are a type of shader connection that we’ll learn about later, but you can think of them as constants that shaders can’t modify. They’re loaded like this:

float projection[16] = { ... };
float modelview[16] = { ... };

GLint projectionUniform = glGetUniformLocation(program, "Projection");
glUniformMatrix4fv(projectionUniform, 1, 0, projection);

GLint modelviewUniform = glGetUniformLocation(program, "Modelview");
glUniformMatrix4fv(modelviewUniform, 1, 0, modelview);

ES 1.1 provides additional ways of manipulating matrices that do not exist in 2.0. For example, the following 1.1 snippet loads an identity matrix and multiplies it by two other matrices:

float view[16] = { ... };
float model[16] = { ... };


The default model-view and projection matrices are identity matrices. The identity transform is effectively a no-op, as shown in Equation 2-2.

Equation 2-2. Identity transform

Identity transform


For details on how to multiply a vector with a matrix, or a matrix with another matrix, check out the code in the appendix.

It’s important to note that this book uses row vector notation rather than column vector notation. In Equation 2-2, both the left side of (vx vy vz 1) and right side of (vx*1 vy*1 vz*1 1) are 4D row vectors. That equation could, however, be expressed in column vector notation like so:

Identity transform

Sometimes it helps to think of a 4D row vector as being a 1×4 matrix, and a 4D column vector as being a 4×1 matrix. (nxm denotes the dimensions of a matrix where n is the number of rows and m is the number of columns.)

Figure 2-6 shows a trick for figuring out whether it’s legal to multiply two quantities in a certain order: the inner numbers should match. The outer numbers tell you the dimensions of the result. Applying this rule, we can see that it’s legal to multiply the two matrices shown in Equation 2-2: the 4D row vector (effectively a 1×4 matrix) on the left of the * and the 4×4 matrix on the right are multiplied to produce a 1×4 matrix (which also happens to be a 4D row vector).

Matrix multiplication dimensionality

Figure 2-6. Matrix multiplication dimensionality

From a coding perspective, I find that row vectors are more natural than column vectors because they look like tiny C-style arrays. It’s valid to think of them as column vectors if you’d like, but if you do so, be aware that the ordering of your transforms will flip around. Ordering is crucial because matrix multiplication is not commutative.

Consider this snippet of ES 1.1 code:


With row vectors, you can think of each successive transform as being premultiplied with the current transform, so the previous snippet is equivalent to the following:

Matrix multiplication dimensionality

With column vectors, each successive transform is postmultiplied, so the code snippet is actually equivalent to the following:

Matrix multiplication dimensionality

Regardless of whether you prefer row or column vectors, you should always think of the last transformation in your code as being the first one to be applied to the vertex. To make this apparent with column vectors, use parentheses to show the order of operations:

Matrix multiplication dimensionality

This illustrates another reason why I like row vectors; they make OpenGL’s reverse-ordering characteristic a little more obvious.

Enough of this mathematical diversion; let’s get back to the photography metaphor and see how it translates into OpenGL. OpenGL ES 1.1 provides a set of helper functions that can generate a matrix and multiply the current transformation by the result, all in one step. We’ll go over each of these helper functions in the coming sections. Since ES 2.0 does not provide helper functions, we’ll also show what they do behind the scenes so that you can implement them yourself.

Recall that there are three matrices involved in OpenGL’s setup:

  1. Adjust the camera’s field-of-view angle; this is the projection matrix.

  2. Position the camera and aim it in the appropriate direction; this is the view matrix.

  3. Scale, rotate, and translate each object; this is the model matrix.

We’ll go over each of these three transforms in reverse so that we can present the simplest transformations first.

Setting the Model Matrix

The three most common operations when positioning an object in a scene are scale, translation, and rotation.


The most trivial helper function is glScalef:

float scale[16] = { sx, 0,  0,  0,
                    0,  sy, 0,  0,
                    0,  0,  sz, 0
                    0,  0,  0,  1 };

// The following two statements are equivalent.
glScalef(sx, sy, sz);

The matrix for scale and its derivation are shown in Equation 2-3.

Equation 2-3. Scale transform

Scale transform

Figure 2-7 depicts a scale transform where sx = sy = 0.5.

Scale transform

Figure 2-7. Scale transform


Nonuniform scale is the case where the x, y, and z scale factors are not all equal to the same value. Such a transformation is perfectly valid, but it can hurt performance in some cases. OpenGL has to do more work to perform the correct lighting computations when nonuniform scale is applied.


Another simple helper transform is glTranslatef, which shifts an object by a fixed amount:

float translation[16] = { 1,  0,  0,  0,
                          0,  1,  0,  0,
                          0,  0,  1,  0,
                          tx, ty, tz, 1 };

// The following two statements are equivalent.
glTranslatef(tx, ty, tz);

Intuitively, translation is achieved with addition, but recall that homogeneous coordinates allow us to express all transformations using multiplication, as shown in Equation 2-4.

Equation 2-4. Translation transform

Translation transform

Figure 2-8 depicts a translation transform where tx = 0.25 and ty = 0.5.


You might recall this transform from the fixed-function variant (ES 1.1) of the HelloArrow sample:

glRotatef(m_currentAngle, 0, 0, 1);

This applies a counterclockwise rotation about the z-axis. The first argument is an angle in degrees; the latter three arguments define the axis of rotation. The ES 2.0 renderer in HelloArrow was a bit tedious because it computed the matrix manually:

#include <cmath>
float radians = m_currentAngle * Pi / 180.0f;
float s = std::sin(radians);
float c = std::cos(radians);
float zRotation[16] = { c, s, 0, 0,
                       -s, c, 0, 0,
                        0, 0, 1, 0,
                        0, 0, 0, 1 };

GLint modelviewUniform = glGetUniformLocation(m_simpleProgram, "Modelview");
glUniformMatrix4fv(modelviewUniform, 1, 0, &zRotation[0]);
Translation transform

Figure 2-8. Translation transform

Figure 2-9 depicts a rotation transform where the angle is 45°.

Rotation transform

Figure 2-9. Rotation transform

Rotation about the z-axis is relatively simple, but rotation around an arbitrary axis requires a more complex matrix. For ES 1.1, glRotatef generates the matrix for you, so there’s no need to get too concerned with its contents. For ES 2.0, check out the appendix to see how to implement this.

By itself, glRotatef rotates only around the origin, so what if you want to rotate around an arbitrary point p? To accomplish this, use a three-step process:

  1. Translate by -p.

  2. Perform the rotation.

  3. Translate by +p.

For example, to change HelloArrow to rotate around (0, 1) rather than the center, you could do this:

glTranslatef(0, +1, 0);
glRotatef(m_currentAngle, 0, 0, 1);
glTranslatef(0, -1, 0);

Remember, the last transform in your code is actually the first one that gets applied!

Setting the View Transform

The simplest way to create a view matrix is with the popular LookAt function. It’s not built into OpenGL ES, but it’s easy enough to implement it from scratch. LookAt takes three parameters: a camera position, a target location, and an “up” vector to define the camera’s orientation (see Figure 2-10).

Using the three input vectors, LookAt produces a transformation matrix that would otherwise be cumbersome to derive using the fundamental transforms (scale, translation, rotation). Example 2-1 is one possible implementation of LookAt.

Example 2-1. LookAt

mat4 LookAt(const vec3& eye, const vec3& target, const vec3& up)
    vec3 z = (eye - target).Normalized();
    vec3 x = up.Cross(z).Normalized();
    vec3 y = z.Cross(x).Normalized();

    mat4 m;
    m.x = vec4(x, 0);
    m.y = vec4(y, 0);
    m.z = vec4(z, 0);
    m.w = vec4(0, 0, 0, 1);

    vec4 eyePrime = m * -eye;
    m = m.Transposed();
    m.w = eyePrime;

    return m;
The LookAt transform

Figure 2-10. The LookAt transform

Note that Example 2-1 uses custom types like vec3, vec4, and mat4. This isn’t pseudocode; it’s actual code from the C++ vector library in the appendix. We’ll discuss the library later in the chapter.

Setting the Projection Transform

Until this point, we’ve been dealing with transformations that are typically used to modify the model-view rather than the projection. ES 1.1 operations such as glRotatef and glTranslatef always affect the current matrix, which can be changed at any time using glMatrixMode. Initially the matrix mode is GL_MODELVIEW.

What’s the distinction between projection and model-view? Novice OpenGL programmers sometimes think of the projection as being the “camera matrix,” but this is an oversimplification, if not completely wrong; the position and orientation of the camera should actually be specified in the model-view. I prefer to think of the projection as being the camera’s “zoom lens” because it affects the field of view.


Camera position and orientation should always go in the model-view, not the projection. OpenGL ES 1.1 depends on this to perform correct lighting calculations.

Two types of projections commonly appear in computer graphics: perspective and orthographic. Perspective projections cause distant objects to appear smaller, just as they do in real life. You can see the difference in Figure 2-11.

Types of projections

Figure 2-11. Types of projections

An orthographic projection is usually appropriate only for 2D graphics, so that’s what we used in HelloArrow:

const float maxX = 2;
const float maxY = 3;
glOrthof(-maxX, +maxX, -maxY, +maxY, -1, 1);

The arguments for glOrthof specify the distance of the six bounding planes from the origin: left, right, bottom, top, near, and far. Note that our example arguments create an aspect ratio of 2:3; this is appropriate since the iPhone’s screen is 320×480. The ES 2.0 renderer in HelloArrow reveals how the orthographic projection is computed:

float a = 1.0f / maxX;
float b = 1.0f / maxY;
float ortho[16] = {
    a, 0,  0, 0,
    0, b,  0, 0,
    0, 0, -1, 0,
    0, 0,  0, 1

When an orthographic projection is centered around the origin, it’s really just a special case of the scale matrix that we already presented in Scale:

sx = 1.0f / maxX
sy = 1.0f / maxY
sz = -1

float scale[16] = { sx, 0,  0,  0,
                    0,  sy, 0,  0,
                    0,  0,  sz, 0
                    0,  0,  0,  1 };

Since HelloCone (the example you’ll see later in this chapter) will have true 3D rendering, we’ll give it a perspective matrix using the glFrustumf command, like this:

glFrustumf(-1.6f, 1.6, -2.4, 2.4, 5, 10);

The arguments to glFrustumf are the same as glOrthof. Since glFrustum does not exist in ES 2.0, HelloCone’s 2.0 renderer will compute the matrix manually, like this:

void ApplyFrustum(float left, float right, float bottom, 
                  float top, float near, float far)
    float a = 2 * near / (right - left);
    float b = 2 * near / (top - bottom);
    float c = (right + left) / (right - left);
    float d = (top + bottom) / (top - bottom);
    float e = - (far + near) / (far - near);
    float f = -2 * far * near / (far - near);

    mat4 m;
    m.x.x = a; m.x.y = 0; m.x.z = 0; m.x.w = 0;
    m.y.x = 0; m.y.y = b; m.y.z = 0; m.y.w = 0;
    m.z.x = c; m.z.y = d; m.z.z = e; m.z.w = -1;
    m.w.x = 0; m.w.y = 0; m.w.z = f; m.w.w = 1;

    glUniformMatrix4fv(projectionUniform, 1, 0, m.Pointer());

When a perspective projection is applied, the field of view is in the shape of a frustum. The viewing frustum is just a chopped-off pyramid with the eye at the apex of the pyramid (see Figure 2-12).

Viewing frustum

Figure 2-12. Viewing frustum

A viewing frustum can also be computed based on the angle of the pyramid’s apex (known as field of view); some developers find these to be more intuitive than specifying all six planes. The function in Example 2-2 takes four arguments: the field-of-view angle, the aspect ratio of the pyramid’s base, and the near and far planes.

Example 2-2. VerticalFieldOfView

void VerticalFieldOfView(float degrees, float aspectRatio, 
                         float near, float far)
   float top = near * std::tan(degrees * Pi / 360.0f);
   float bottom = -top;
   float left = bottom * aspectRatio;
   float right = top * aspectRatio;

   glFrustum(left, right, bottom, top, near, far);


For perspective projection, avoid setting your near or far plane to zero or a negative number. Mathematically this just doesn’t work out.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required