Normal Packing algorithms are getting a lot of interest recently with the generalization of deferred rendering. A good reference can be found here. But most of these algorithms are designed to efficiently pack/unpack from the fragment stage. They takes their input from the entire normalized 3D space an output to non-uniform packed vector spaces. This leads to uneven distribution of the unpacked normal vectors, particularly noticeable when using a small discrete interval.

Our needs are a bit different though; we want to pack our terrain normals on a single byte for vertex lighting without too much visual degradation. We are only interested in normal vectors from the Z+ hemisphere and since our packed space is 8 bits wide, we can’t afford to much invalid/duplicate values.

#### Unpacking

Considering an unsigned byte value, lets decompose it in two distinct 4-bits values:

The 4 high order bits give us the general direction of our point on the unit circle. We have 16 possible directions, hence the PI/8 constant term:

const float th = (PI / 8.0f) * (float)(b / 16);

The remaining 4 bits give us the distance from the origin, expressed on [0, 1]:

const float d = sqrt((float)(b % 16) / 15.0001f);

A delta of +0.0001 is added to compensate from precision error creating overflow situations in edge cases. If we use the distance as-is, we have an accumulation of vectors around the top of our hemisphere. The square root of this value gives us a far better looking distribution:

‘th’ and ‘d’ gives us the position of a point, P, inside the unit circle. The x and y components of our normal are the projection of P on the X and Y axis:

const float nx = cos(th) * d; const float ny = sin(th) * d;

A unit vector has the following property x² + y² + z² = 1. Let’s isolate the z term:

z² = 1 – (x² + y²) => z = sqrt(1 – (x² + y²)).

const float nz = sqrt(1.0f - (nx * nx + ny * ny));

With this technique, we can express 241 different vectors out of the 255 possible values.

#### Packing

The interesting part is of course the packing process:

We start by computing the squared distance to the origin, ‘d’, that will be stored in the 4 low order bits of the packed value. This is the squared length of the projection of our input normal vector on the XY plane :

pow(length(n.xy), 2);

The pow() cancels the sqrt() inside the len() function, so we can reduce the expression to:

const float d = n.x * n.x + n.y * n.y; // ie. pow(length(n.xy), 2);

Now, we must find the angle ‘th’ between the projected of the normal on the XY plane and the X axis. On the unit circle, the x term of a normalized vector is the cosine of the angle (the y term being the sinus). We must compute the inverse of cosine function to get our angle back: acos(normalize(n.xy).x);

Since we don’t care about the y term, the expression is reduced to:

const float acth = glm::acos(n.x / glm::sqrt(d)); //ie. acos(normalize(n.xy).x);

But there is a catch … acos() output is defined on [0, PI], we therefore use the sign of the y term of our input vector to determine if our value lies in the upper of lower quadrants of the unit circle.

const float th = (n.y >= 0) ? acth : PI2 - acth;

We finally compute the packed angle and length components. Note that we add an offset of PI/32 (half the step of our packed angle value) so that values are clamped to the nearest step and not floored.

const byte angle = (byte)(((th + PIOVER32) / PI2) * 16.0f);

const byte length = (byte)(ln * 16.0f);

The our packing/unpacking functions are

static inline const byte pack_normal(const glm::vec3 & n) { const float d = n.x * n.x + n.y * n.y; // ie. pow(length(n.xy), 2); const float acth = glm::acos(n.x / glm::sqrt(d)); //ie. acos(normalize(n.xy).x); const float th = (n.y >= 0) ? acth : PI2 - acth; const byte angle = (byte)(((th + PIOVER32) / PI2) * 16.0f); const byte length = (byte)(ln * 16.0f); return (angle << 4) | length; }

static inline const glm::vec3 unpack_normal(const byte b) { const float th = (PI / 8.0f) * (b / 16); const float d = sqrt((float)(b % 16) / 15.0001f); const float nx = cos(th) * d; const float ny = sin(th) * d; const float nz = sqrt(1.0f - (nx * nx + ny * ny)); return glm::vec3(nx, ny, nz); }

Since the unpacking is done by the vertex shader, here is the GLSL implementation :

#define PIOVER8 0.39269908169 vec3 unpack(in float pk_n) { float n_th = PIOVER8 * pk_n / 16.0; float n_d = sqrt(mod(pk_n, 16.0) / 15.001); vec2 nv = vec2(cos(n_th), sin(n_th)) * n_d; float nz = sqrt(1.0 - (nv.x * nv.x + nv.y * nv.y)); return vec3(nv, nz); }

Of course, if your glsl version supports static array definition, you can simply use direct indexing in the array outputted by the following code :

void dumpArray() { printf("const vec3 unpaked_normals[256] = vec3[]("); for (int i = 0; i < 0xFF; ++i) { const glm::vec3 & v = unpack((byte)i); printf("vec3(%1.8f, %1.8f, %1.8f), ", v.x, v.y, v.z); }{ const glm::vec3 & v = unpack(0xFF); printf("vec3(%1.8f, %1.8f, %1.8f));", v.x, v.y, v.z); } }

… and correct the generated syntax as needed.