Terrible Cryptography with Zig: MD5

I’ve picked up something of an interest in cryptography recently, so I decided to reimplement an old-school (and insecure) hashing algorithm: MD5.

In case you don’t ~~live under a rock~~ keep up with technology, you’ve may have heard of MD5 before and know its history. So please forgive me for the following disclaimer:

Disclaimer

MD5 is an insecure and broken hash function. Do not use it in actual applications that need security. It is only used in this post for pedagogical purposes. The author is not (yet) a competent cryptographer, so consume this information with a healthy dose of salt.
With that out of the way, let’s begin.

Message Digest 5

MD5 is Merkle-Damgård hash function, which means that it accepts as input a large quantity of data and maps it into a smaller fixed size, which is hopefully meaningful.

For the implementation we will use Zig, a great replacement for C. We will not use the Holy K&R C that God gave to Abraham on Mount Bell Labs. We use Zig for great justice.

MD5 is relatively simple and a spec exists so we can jump straight into an implementation. Starting from the beginning:

The Specification

Network Working Group                                          R. Rivest
Request for Comments: 1321           MIT Laboratory for Computer Science
                                             and RSA Data Security, Inc.
                                                              April 1992


                     The MD5 Message-Digest Algorithm

Alright, we’re in the right place. Let’s skip the boring stuff and extract the important parts of the “summary.”

This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. […] The MD5 algorithm is intended for digital signature applications, where a large file must be “compressed”[.] […] The MD5 algorithm is designed to be quite fast on 32-bit machines. […]

Okay it’s a hashing function (we already knew that) that takes large messages and generates a fingerprint describing the original large message. We also know MD5 is optimized for 32-bit machines, so we can expect decent performance on modern processors.
For the purposes of this post, we will only handle messages that have lengths in bit-multiples of 8 (full bytes only); this will make our implementation easier. For teaching purposes, we will also make a duplicate of our message to work on to make padding easier. (This is also terribly inefficient).

Append and Pad

First we go to §3.1, Step 1: Append Padding. We need to split and pad our message payload into zero or more blocks of 512 bits, and a final trailing block of 448 bits with the leftovers.

On the last block we need to add a single 1 bit, and then keep appending zeros to that block until we get a final block length of M bits where M ≡ 448 (mod 512). HINT: If that equation confused you, go read the wiki page on modular arithmetic then come back.
If we end up padding with 0’s into another next block that’s fine, we’ll just add the first 512 bits of the “padding” blocks to the other 512-bit blocks and only continue with the final block, which ends up 64 bits shy of 512 at 448 bits.

Our padding code looks something like this:

const std = @import("std");
const Message = std.ArrayList(u8);

fn appendPaddingAndLength(bytes: []const u8, alloc: std.mem.Allocator) ![]u8 {
    var arrList = Message.init(alloc);
    var targetSize: usize = bytes.len;
    while (@rem(targetSize, 512/8) != 448/8 or targetSize <= bytes.len + 64/8) {
        targetSize += 1;
    }
    try arrList.ensureTotalCapacity(targetSize + 8);
    try arrList.appendSlice(bytes[0..]);

    try arrList.append(0x80);
    while (arrList.items.len % (512/8) != (448/8)) {
        try arrList.append(0x00);
    }
    // continued below

So for our final block we should now have a length of 448 bits (56 bytes).
Now for §3.2, Step 2: Append Length. The RFC says:

A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. […] the resulting message […] has a length that is an exact multiple of 512 bits. […]

Okay. Now let’s write the bottom 64 bits of the message length (in bytes) to that block to round it out to 512 bits (64 bytes):

    // continued from above
    const used_len = @truncate(u64, @intCast(u128, bytes.len) * 8);
    const pos = arrList.items.len;

    var i: i32 = 0;
    while (i < 8) : (i += 1) {
        try arrList.append(0);
    }
    // the spec specifies the appended length is Little Endian
    std.mem.writeIntLittle(u64, arrList.items[pos..].ptr[0..8], used_len);
    
    return arrList.toOwnedSlice();
}

Now that we’re done with the prep work, we can do actual hashing. It’s time for §3.3, Step 3: Initialize the Buffer. MD5 uses an 16-byte internal state, which is initially defined as:

const digest_initial: [16]u8 align(@alignOf(u32)) = [16]u8{
    0x01, 0x23, 0x45, 0x67,
    0x89, 0xAB, 0xCD, 0xEF,
    0xFE, 0xDC, 0xBA, 0x98,
    0x76, 0x54, 0x32, 0x10,
};

The current state of the message digest is represented as four 32 bit little-endian integers. For example if the state was:

0xA4A3A2A1,
0xB4B3B2B1,
0xC4C3C2C1,
0xD4D3D2D1

then our digest represented as bytes would be:

0xA1, 0xA2, 0xA3, 0xA4,
0xB1, 0xB2, 0xB3, 0xB4,
0xC1, 0xC2, 0xC3, 0xC4,
0xD1, 0xD2, 0xD3, 0xD4

Now we’re starting to get somewhere. It’s time for meat of the MD5 algorithm.

The “Core” Algorithm

MD5 does four rounds of computation on each block then passes on the existing state to the next block. It looks something like this

fn calcMD5(bytes: []const u8, alloc: std.mem.Allocator) !*[16]u8 {
    var padded = try appendPaddingAndLength(bytes[0..], alloc);
    defer _ = alloc.free(padded);

    var digest = try alloc.alloc(u8, 16);
    @memcpy(digest.ptr, digest_initial[0..], 16);

    const N = (padded.len + 8) / (512/8);
    var msg = @intToPtr([*]u32, @ptrToInt(padded.ptr));

    var i: usize = 0;
    while (i < N): (i += 1) {
        const start = i * 16;
        const end = i * 16 + 16;
        processBlock(digest[0..16], msg[start..end]);
    }

    return digest[0..16];
}

Then we can write processBlock. Note the code uses the @addWithOverflow Zig builtin because the addition in MD5 is Modulo 2^32, but Zig detects integer overflows and reports if they occur unless we use the builtin. We don’t care about the overflow in this case, so we tell Zig to ignore it.

// offset indices in bytes for each 32-bit "word" in MD5 digest
const iA = 0;
const iB = 4;
const iC = 8;
const iD = 12;

// helper functions
const readWord = std.mem.readIntLittle;
const writeWord = std.mem.writeIntLittle;

// core algorithm of MD5. See §3.4, Step 4
fn processBlock(digest: *[16]u8, block: []u32) void {
    const AA = readWord(u32, digest[iA .. iA + 4]);
    const BB = readWord(u32, digest[iB .. iB + 4]);
    const CC = readWord(u32, digest[iC .. iC + 4]);
    const DD = readWord(u32, digest[iD .. iD + 4]);

    round1(digest[0..16], block[0..16]);
    round2(digest[0..16], block[0..16]);
    round3(digest[0..16], block[0..16]);
    round4(digest[0..16], block[0..16]);

    const A = readWord(u32, digest[iA .. iA + 4]);
    const B = readWord(u32, digest[iB .. iB + 4]);
    const C = readWord(u32, digest[iC .. iC + 4]);
    const D = readWord(u32, digest[iD .. iD + 4]);
    var result: u32 = 0;
    _ = @addWithOverflow(u32, A, AA, &result);
    writeWord(u32, digest[iA .. iA + 4], result);
    _ = @addWithOverflow(u32, B, BB, &result);
    writeWord(u32, digest[iB .. iB + 4], result);
    _ = @addWithOverflow(u32, C, CC, &result);
    writeWord(u32, digest[iC .. iC + 4], result);
    _ = @addWithOverflow(u32, D, DD, &result);
    writeWord(u32, digest[iD .. iD + 4], result);
}

And that’s it! Now the tricky part, the actual hashing logic. Now we need to define the helper “auxiliary” functions and some constants.

The Auxiliaries

The core of the MD5 algorithm uses four auxiliary bit-wise functions that accept three 32-bit values and result in a single 32-bit value. Those are:

fn F(x: u32, y: u32, z: u32) u32 {
   return (x & y) | (~x & z);
}
fn G(x: u32, y: u32, z: u32) u32 {
    return x & z | (y & ~z);
}
fn H(x: u32, y: u32, z: u32) u32 {
    return x ^ y ^ z;
}
fn I(x: u32, y: u32, z: u32) u32 {
    return y ^ (x | ~z);
}

To define the lookup table we will use a nice feature of Zig called compile time evaluation (a.k.a. comptime). The table of magic values used is defined as:

[…] a 64-element table T[1 … 64] […] Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians.

Okay that’s straightforward. Here’s the corresponding Zig:

fn calcTableT() [65]u32 {
    var table: [65]u32 = [_]u32{0} ** 65;
    var i: usize = 0;
    const modulo = std.math.pow(u64, 2, 32);
    while (i < 65): (i += 1) {
        const lhs = @intToFloat(f64, modulo);
        const rhs = std.math.absFloat(std.math.sin(@intToFloat(f64, @intCast(u32, i))));
        const val = @floatToInt(u64, lhs * rhs);
        table[i] = @truncate(u32, val);
    }
    return table;
}
const T = calcTableT();

Note that we specified 65 elements in the table, this is for brevity and so we can use the same indices as the specification. Fast implementations of MD5 wouldn’t do this.

The Rounds

Finally, the most tedious part of MD5, the computation rounds. Each round uses magic numbers and does computation on the digest by rotating the where each 32-bit value is used. Note that each “round” takes two inputs: the current digest state and the current message block.

Each round mutates the digest state using the table T and auxiliary functions F through I we defined earlier. The rounds do 16 computations each:

/* Round 1. */
/* Let [abcd k s i] denote the operation
     a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD  0  7  1]  [DABC  1 12  2]  [CDAB  2 17  3]  [BCDA  3 22  4]
[ABCD  4  7  5]  [DABC  5 12  6]  [CDAB  6 17  7]  [BCDA  7 22  8]
[ABCD  8  7  9]  [DABC  9 12 10]  [CDAB 10 17 11]  [BCDA 11 22 12]
[ABCD 12  7 13]  [DABC 13 12 14]  [CDAB 14 17 15]  [BCDA 15 22 16]

/* Round 2. */
/* Let [abcd k s i] denote the operation
     a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD  1  5 17]  [DABC  6  9 18]  [CDAB 11 14 19]  [BCDA  0 20 20]
[ABCD  5  5 21]  [DABC 10  9 22]  [CDAB 15 14 23]  [BCDA  4 20 24]
[ABCD  9  5 25]  [DABC 14  9 26]  [CDAB  3 14 27]  [BCDA  8 20 28]
[ABCD 13  5 29]  [DABC  2  9 30]  [CDAB  7 14 31]  [BCDA 12 20 32]

/* Round 3. */
/* Let [abcd k s i] denote the operation
     a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD  5  4 33]  [DABC  8 11 34]  [CDAB 11 16 35]  [BCDA 14 23 36]
[ABCD  1  4 37]  [DABC  4 11 38]  [CDAB  7 16 39]  [BCDA 10 23 40]
[ABCD 13  4 41]  [DABC  0 11 42]  [CDAB  3 16 43]  [BCDA  6 23 44]
[ABCD  9  4 45]  [DABC 12 11 46]  [CDAB 15 16 47]  [BCDA  2 23 48]

/* Round 4. */
/* Let [abcd k s i] denote the operation
     a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD  0  6 49]  [DABC  7 10 50]  [CDAB 14 15 51]  [BCDA  5 21 52]
[ABCD 12  6 53]  [DABC  3 10 54]  [CDAB 10 15 55]  [BCDA  1 21 56]
[ABCD  8  6 57]  [DABC 15 10 58]  [CDAB  6 15 59]  [BCDA 13 21 60]
[ABCD  4  6 61]  [DABC 11 10 62]  [CDAB  2 15 63]  [BCDA  9 21 64]

Unfortunately there’s no pretty or compact way to write this code, so here it is:

fn round1(digest: *[16]u8, block: []u32) void {
    round1Func(iA, iB, iC, iD, digest, block, 0, 7, 1);
    round1Func(iD, iA, iB, iC, digest, block, 1, 12, 2);
    round1Func(iC, iD, iA, iB, digest, block, 2, 17, 3);
    round1Func(iB, iC, iD, iA, digest, block, 3, 22, 4);
    round1Func(iA, iB, iC, iD, digest, block, 4, 7, 5);
    round1Func(iD, iA, iB, iC, digest, block, 5, 12, 6);
    round1Func(iC, iD, iA, iB, digest, block, 6, 17, 7);
    round1Func(iB, iC, iD, iA, digest, block, 7, 22, 8);
    round1Func(iA, iB, iC, iD, digest, block, 8, 7, 9);
    round1Func(iD, iA, iB, iC, digest, block, 9, 12, 10);
    round1Func(iC, iD, iA, iB, digest, block, 10, 17, 11);
    round1Func(iB, iC, iD, iA, digest, block, 11, 22, 12);
    round1Func(iA, iB, iC, iD, digest, block, 12, 7, 13);
    round1Func(iD, iA, iB, iC, digest, block, 13, 12, 14);
    round1Func(iC, iD, iA, iB, digest, block, 14, 17, 15);
    round1Func(iB, iC, iD, iA, digest, block, 15, 22, 16);
}
fn round2(digest: *[16]u8, block: []u32) void {
    round2Func(iA, iB, iC, iD, digest, block, 1, 5, 17);
    round2Func(iD, iA, iB, iC, digest, block, 6, 9, 18);
    round2Func(iC, iD, iA, iB, digest, block, 11, 14, 19);
    round2Func(iB, iC, iD, iA, digest, block, 0, 20, 20);
    round2Func(iA, iB, iC, iD, digest, block, 5, 5, 21);
    round2Func(iD, iA, iB, iC, digest, block, 10, 9, 22);
    round2Func(iC, iD, iA, iB, digest, block, 15, 14, 23);
    round2Func(iB, iC, iD, iA, digest, block, 4, 20, 24);
    round2Func(iA, iB, iC, iD, digest, block, 9, 5, 25);
    round2Func(iD, iA, iB, iC, digest, block, 14, 9, 26);
    round2Func(iC, iD, iA, iB, digest, block, 3, 14, 27);
    round2Func(iB, iC, iD, iA, digest, block, 8, 20, 28);
    round2Func(iA, iB, iC, iD, digest, block, 13, 5, 29);
    round2Func(iD, iA, iB, iC, digest, block, 2, 9, 30);
    round2Func(iC, iD, iA, iB, digest, block, 7, 14, 31);
    round2Func(iB, iC, iD, iA, digest, block, 12, 20, 32);
}
fn round3(digest: *[16]u8, block: []u32) void {
    round3Func(iA, iB, iC, iD, digest, block, 5, 4, 33);
    round3Func(iD, iA, iB, iC, digest, block, 8, 11, 34);
    round3Func(iC, iD, iA, iB, digest, block, 11, 16, 35);
    round3Func(iB, iC, iD, iA, digest, block, 14, 23, 36);
    round3Func(iA, iB, iC, iD, digest, block, 1, 4, 37);
    round3Func(iD, iA, iB, iC, digest, block, 4, 11, 38);
    round3Func(iC, iD, iA, iB, digest, block, 7, 16, 39);
    round3Func(iB, iC, iD, iA, digest, block, 10, 23, 40);
    round3Func(iA, iB, iC, iD, digest, block, 13, 4, 41);
    round3Func(iD, iA, iB, iC, digest, block, 0, 11, 42);
    round3Func(iC, iD, iA, iB, digest, block, 3, 16, 43);
    round3Func(iB, iC, iD, iA, digest, block, 6, 23, 44);
    round3Func(iA, iB, iC, iD, digest, block, 9, 4, 45);
    round3Func(iD, iA, iB, iC, digest, block, 12, 11, 46);
    round3Func(iC, iD, iA, iB, digest, block, 15, 16, 47);
    round3Func(iB, iC, iD, iA, digest, block, 2, 23, 48);
}
fn round4(digest: *[16]u8, block: []u32) void {
    round4Func(iA, iB, iC, iD, digest, block, 0, 6, 49);
    round4Func(iD, iA, iB, iC, digest, block, 7, 10, 50);
    round4Func(iC, iD, iA, iB, digest, block, 14, 15, 51);
    round4Func(iB, iC, iD, iA, digest, block, 5, 21, 52);
    round4Func(iA, iB, iC, iD, digest, block, 12, 6, 53);
    round4Func(iD, iA, iB, iC, digest, block, 3, 10, 54);
    round4Func(iC, iD, iA, iB, digest, block, 10, 15, 55);
    round4Func(iB, iC, iD, iA, digest, block, 1, 21, 56);
    round4Func(iA, iB, iC, iD, digest, block, 8, 6, 57);
    round4Func(iD, iA, iB, iC, digest, block, 15, 10, 58);
    round4Func(iC, iD, iA, iB, digest, block, 6, 15, 59);
    round4Func(iB, iC, iD, iA, digest, block, 13, 21, 60);
    round4Func(iA, iB, iC, iD, digest, block, 4, 6, 61);
    round4Func(iD, iA, iB, iC, digest, block, 11, 10, 62);
    round4Func(iC, iD, iA, iB, digest, block, 2, 15, 63);
    round4Func(iB, iC, iD, iA, digest, block, 9, 21, 64);
}
fn rotateLeft(num: u32, bits: u5) u32 {
    const upper = (num << bits) & 0xffffffff;
    const lower = (num >> @intCast(u5, (32 - @intCast(u32, bits)))) & 0xffffffff;

    return upper | lower;
}
fn round1Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
    const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
    const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
    const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
    const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);
    const to_rotate = @truncate(u32, A + F(B, C, D) + X[k] + T[i]);
    const rotated = rotateLeft(to_rotate, s);
    var ret: u32 = 0;
    _ = @addWithOverflow(u32, B, rotated, &ret);
    writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round2Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
    const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
    const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
    const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
    const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);

    const to_rotate = @truncate(u32, A + G(B, C, D) + X[k] + T[i]);
    const rotated = rotateLeft(to_rotate, s);
    var ret: u32 = 0;
    _ = @addWithOverflow(u32, B, rotated, &ret);
    writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round3Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
    const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
    const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
    const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
    const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);

    const to_rotate = @truncate(u32, A + H(B, C, D) + X[k] + T[i]);
    const rotated = rotateLeft(to_rotate, s);
    var ret: u32 = 0;
    _ = @addWithOverflow(u32, B, rotated, &ret);
    writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round4Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
    const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
    const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
    const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
    const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);

    const to_rotate = @truncate(u32, A + I(B, C, D) + X[k] + T[i]);
    const rotated = rotateLeft(to_rotate, s);
    var ret: u32 = 0;
    _ = @addWithOverflow(u32, B, rotated, &ret);
    writeWord(u32, digest[nA..].ptr[0..4], ret);
}

Whew. Not to crazy – or complicated – just a lot of repetition. Note the use of @addWithOverflow again.

Putting it All Together

With core MD5 algorithm out of the way we need to actually calculate the hash of something. Let’s do that now.

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    var alloc = gpa.allocator();
    defer _ = gpa.deinit();

    const msg = "message digest";
    const msgDigest = try calcMD5(msg, alloc);
    defer _ = alloc.free(msgDigest);

    for (msgDigest) |ch| {
        try std.io.getStdOut().writer().print("{x:0>2}", .{ch});
    }
    try std.io.getStdOut().writer().print("\n", .{});
}

But does this 💩 work? Let’s compare to a known good implementation md5sum:

$ echo -n 'message digest' | md5sum
f96b697d7cb7938d525a2f31aaf161d0  -
$ zig run md5.zig
f96b697d7cb7938d525a2f31aaf161d0

Yay 🎉 it worked 🏆!

Conclusion

In this post we implemented the MD5 hashing algorithm in Zig. We wrote a very barebones unoptimized version. A future project for the reader could be improving the implementation to only allocate the padding in the final blocks of a hashed message.

In case you have issues reproducing/compiling the code here, then try downloading md5.zig and compiling.

NB: The code in this post’s text is licensed under the GNU Public License Version 3 or any later version at your choice.

Cam's Cognition