Terrible Cryptography with Zig: MD5
Jul 23, 2022 “We're gonna hash like it's 1992.”#md5 #zig #cryptography #programming
I’ve picked up something of an interest in cryptography recently, so I decided to reimplement an old-school (and insecure) hashing algorithm: MD5.
In case you don’t live under a rock keep up with technology, you’ve may have
heard of MD5 before and know its
history. So please forgive me for the following disclaimer:
Disclaimer
MD5 is an insecure and broken hash function. Do not use it in actual
applications that need security. It is only used in this post for pedagogical
purposes. The author is not (yet) a competent cryptographer, so consume
this information with a healthy dose of salt.
With that out of the way, let’s begin.
Message Digest 5
MD5 is Merkle-Damgård hash function, which means that it accepts as input a large quantity of data and maps it into a smaller fixed size, which is hopefully meaningful.
For the implementation we will use Zig, a great replacement for C. We will not use the Holy K&R C that God gave to Abraham on Mount Bell Labs. We use Zig for great justice.
MD5 is relatively simple and a spec exists so we can jump straight into an implementation. Starting from the beginning:
The Specification
Network Working Group R. Rivest
Request for Comments: 1321 MIT Laboratory for Computer Science
and RSA Data Security, Inc.
April 1992
The MD5 Message-Digest Algorithm
Alright, we’re in the right place. Let’s skip the boring stuff and extract the important parts of the “summary.”
This document describes the MD5 message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. […] The MD5 algorithm is intended for digital signature applications, where a large file must be “compressed”[.] […] The MD5 algorithm is designed to be quite fast on 32-bit machines. […]
Okay it’s a hashing function (we already knew that) that takes large messages
and generates a fingerprint describing the original large message. We also know
MD5 is optimized for 32-bit machines, so we can expect decent performance on
modern processors.
For the purposes of this post, we will only handle messages that have
lengths in bit-multiples of 8 (full bytes only); this will make our
implementation easier. For teaching purposes, we will also make a duplicate of our
message to work on to make padding easier. (This is also terribly inefficient).
Append and Pad
First we go to §3.1, Step 1: Append Padding. We need to split and pad our message payload into zero or more blocks of 512 bits, and a final trailing block of 448 bits with the leftovers.
On the last block we need to add a single 1
bit, and then keep
appending zeros to that block until we get a final block length of M bits where
M ≡ 448 (mod 512)
. HINT: If that equation confused you, go read the wiki page on
modular arithmetic then come
back.
If we end up padding with 0’s into another next block that’s fine, we’ll
just add the first 512 bits of the “padding” blocks to the other 512-bit
blocks and only continue with the final block, which ends up 64 bits shy of
512 at 448 bits.
Our padding code looks something like this:
const std = @import("std");
const Message = std.ArrayList(u8);
fn appendPaddingAndLength(bytes: []const u8, alloc: std.mem.Allocator) ![]u8 {
var arrList = Message.init(alloc);
var targetSize: usize = bytes.len;
while (@rem(targetSize, 512/8) != 448/8 or targetSize <= bytes.len + 64/8) {
targetSize += 1;
}
try arrList.ensureTotalCapacity(targetSize + 8);
try arrList.appendSlice(bytes[0..]);
try arrList.append(0x80);
while (arrList.items.len % (512/8) != (448/8)) {
try arrList.append(0x00);
}
// continued below
So for our final block we should now have a length of 448 bits (56 bytes).
Now for §3.2, Step 2: Append Length. The RFC says:
A 64-bit representation of b (the length of the message before the padding bits were added) is appended to the result of the previous step. […] the resulting message […] has a length that is an exact multiple of 512 bits. […]
Okay. Now let’s write the bottom 64 bits of the message length (in bytes) to that block to round it out to 512 bits (64 bytes):
// continued from above
const used_len = @truncate(u64, @intCast(u128, bytes.len) * 8);
const pos = arrList.items.len;
var i: i32 = 0;
while (i < 8) : (i += 1) {
try arrList.append(0);
}
// the spec specifies the appended length is Little Endian
std.mem.writeIntLittle(u64, arrList.items[pos..].ptr[0..8], used_len);
return arrList.toOwnedSlice();
}
Now that we’re done with the prep work, we can do actual hashing. It’s time for §3.3, Step 3: Initialize the Buffer. MD5 uses an 16-byte internal state, which is initially defined as:
const digest_initial: [16]u8 align(@alignOf(u32)) = [16]u8{
0x01, 0x23, 0x45, 0x67,
0x89, 0xAB, 0xCD, 0xEF,
0xFE, 0xDC, 0xBA, 0x98,
0x76, 0x54, 0x32, 0x10,
};
The current state of the message digest is represented as four 32 bit little-endian integers. For example if the state was:
0xA4A3A2A1,
0xB4B3B2B1,
0xC4C3C2C1,
0xD4D3D2D1
then our digest represented as bytes would be:
0xA1, 0xA2, 0xA3, 0xA4,
0xB1, 0xB2, 0xB3, 0xB4,
0xC1, 0xC2, 0xC3, 0xC4,
0xD1, 0xD2, 0xD3, 0xD4
Now we’re starting to get somewhere. It’s time for meat of the MD5 algorithm.
The “Core” Algorithm
MD5 does four rounds of computation on each block then passes on the existing state to the next block. It looks something like this
fn calcMD5(bytes: []const u8, alloc: std.mem.Allocator) !*[16]u8 {
var padded = try appendPaddingAndLength(bytes[0..], alloc);
defer _ = alloc.free(padded);
var digest = try alloc.alloc(u8, 16);
@memcpy(digest.ptr, digest_initial[0..], 16);
const N = (padded.len + 8) / (512/8);
var msg = @intToPtr([*]u32, @ptrToInt(padded.ptr));
var i: usize = 0;
while (i < N): (i += 1) {
const start = i * 16;
const end = i * 16 + 16;
processBlock(digest[0..16], msg[start..end]);
}
return digest[0..16];
}
Then we can write processBlock
. Note the code uses the
@addWithOverflow
Zig builtin because the addition in MD5 is Modulo 2^32, but Zig
detects integer overflows and reports if they occur unless we use the builtin. We
don’t care about the overflow in this case, so we tell Zig to ignore it.
// offset indices in bytes for each 32-bit "word" in MD5 digest
const iA = 0;
const iB = 4;
const iC = 8;
const iD = 12;
// helper functions
const readWord = std.mem.readIntLittle;
const writeWord = std.mem.writeIntLittle;
// core algorithm of MD5. See §3.4, Step 4
fn processBlock(digest: *[16]u8, block: []u32) void {
const AA = readWord(u32, digest[iA .. iA + 4]);
const BB = readWord(u32, digest[iB .. iB + 4]);
const CC = readWord(u32, digest[iC .. iC + 4]);
const DD = readWord(u32, digest[iD .. iD + 4]);
round1(digest[0..16], block[0..16]);
round2(digest[0..16], block[0..16]);
round3(digest[0..16], block[0..16]);
round4(digest[0..16], block[0..16]);
const A = readWord(u32, digest[iA .. iA + 4]);
const B = readWord(u32, digest[iB .. iB + 4]);
const C = readWord(u32, digest[iC .. iC + 4]);
const D = readWord(u32, digest[iD .. iD + 4]);
var result: u32 = 0;
_ = @addWithOverflow(u32, A, AA, &result);
writeWord(u32, digest[iA .. iA + 4], result);
_ = @addWithOverflow(u32, B, BB, &result);
writeWord(u32, digest[iB .. iB + 4], result);
_ = @addWithOverflow(u32, C, CC, &result);
writeWord(u32, digest[iC .. iC + 4], result);
_ = @addWithOverflow(u32, D, DD, &result);
writeWord(u32, digest[iD .. iD + 4], result);
}
And that’s it! Now the tricky part, the actual hashing logic. Now we need to define the helper “auxiliary” functions and some constants.
The Auxiliaries
The core of the MD5 algorithm uses four auxiliary bit-wise functions that accept three 32-bit values and result in a single 32-bit value. Those are:
fn F(x: u32, y: u32, z: u32) u32 {
return (x & y) | (~x & z);
}
fn G(x: u32, y: u32, z: u32) u32 {
return x & z | (y & ~z);
}
fn H(x: u32, y: u32, z: u32) u32 {
return x ^ y ^ z;
}
fn I(x: u32, y: u32, z: u32) u32 {
return y ^ (x | ~z);
}
To define the lookup table we will use a nice feature of Zig called compile time
evaluation (a.k.a. comptime
). The table of magic values used is defined as:
[…] a 64-element table T[1 … 64] […] Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians.
Okay that’s straightforward. Here’s the corresponding Zig:
fn calcTableT() [65]u32 {
var table: [65]u32 = [_]u32{0} ** 65;
var i: usize = 0;
const modulo = std.math.pow(u64, 2, 32);
while (i < 65): (i += 1) {
const lhs = @intToFloat(f64, modulo);
const rhs = std.math.absFloat(std.math.sin(@intToFloat(f64, @intCast(u32, i))));
const val = @floatToInt(u64, lhs * rhs);
table[i] = @truncate(u32, val);
}
return table;
}
const T = calcTableT();
Note that we specified 65 elements in the table, this is for brevity and so we can use the same indices as the specification. Fast implementations of MD5 wouldn’t do this.
The Rounds
Finally, the most tedious part of MD5, the computation rounds. Each round uses magic numbers and does computation on the digest by rotating the where each 32-bit value is used. Note that each “round” takes two inputs: the current digest state and the current message block.
Each round mutates the digest state using the table T
and auxiliary functions
F
through I
we defined earlier. The rounds do 16 computations each:
/* Round 1. */
/* Let [abcd k s i] denote the operation
a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]
[ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8]
[ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12]
[ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]
/* Round 2. */
/* Let [abcd k s i] denote the operation
a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20]
[ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24]
[ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28]
[ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]
/* Round 3. */
/* Let [abcd k s i] denote the operation
a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36]
[ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40]
[ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44]
[ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]
/* Round 4. */
/* Let [abcd k s i] denote the operation
a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
[ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52]
[ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56]
[ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60]
[ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]
Unfortunately there’s no pretty or compact way to write this code, so here it is:
fn round1(digest: *[16]u8, block: []u32) void {
round1Func(iA, iB, iC, iD, digest, block, 0, 7, 1);
round1Func(iD, iA, iB, iC, digest, block, 1, 12, 2);
round1Func(iC, iD, iA, iB, digest, block, 2, 17, 3);
round1Func(iB, iC, iD, iA, digest, block, 3, 22, 4);
round1Func(iA, iB, iC, iD, digest, block, 4, 7, 5);
round1Func(iD, iA, iB, iC, digest, block, 5, 12, 6);
round1Func(iC, iD, iA, iB, digest, block, 6, 17, 7);
round1Func(iB, iC, iD, iA, digest, block, 7, 22, 8);
round1Func(iA, iB, iC, iD, digest, block, 8, 7, 9);
round1Func(iD, iA, iB, iC, digest, block, 9, 12, 10);
round1Func(iC, iD, iA, iB, digest, block, 10, 17, 11);
round1Func(iB, iC, iD, iA, digest, block, 11, 22, 12);
round1Func(iA, iB, iC, iD, digest, block, 12, 7, 13);
round1Func(iD, iA, iB, iC, digest, block, 13, 12, 14);
round1Func(iC, iD, iA, iB, digest, block, 14, 17, 15);
round1Func(iB, iC, iD, iA, digest, block, 15, 22, 16);
}
fn round2(digest: *[16]u8, block: []u32) void {
round2Func(iA, iB, iC, iD, digest, block, 1, 5, 17);
round2Func(iD, iA, iB, iC, digest, block, 6, 9, 18);
round2Func(iC, iD, iA, iB, digest, block, 11, 14, 19);
round2Func(iB, iC, iD, iA, digest, block, 0, 20, 20);
round2Func(iA, iB, iC, iD, digest, block, 5, 5, 21);
round2Func(iD, iA, iB, iC, digest, block, 10, 9, 22);
round2Func(iC, iD, iA, iB, digest, block, 15, 14, 23);
round2Func(iB, iC, iD, iA, digest, block, 4, 20, 24);
round2Func(iA, iB, iC, iD, digest, block, 9, 5, 25);
round2Func(iD, iA, iB, iC, digest, block, 14, 9, 26);
round2Func(iC, iD, iA, iB, digest, block, 3, 14, 27);
round2Func(iB, iC, iD, iA, digest, block, 8, 20, 28);
round2Func(iA, iB, iC, iD, digest, block, 13, 5, 29);
round2Func(iD, iA, iB, iC, digest, block, 2, 9, 30);
round2Func(iC, iD, iA, iB, digest, block, 7, 14, 31);
round2Func(iB, iC, iD, iA, digest, block, 12, 20, 32);
}
fn round3(digest: *[16]u8, block: []u32) void {
round3Func(iA, iB, iC, iD, digest, block, 5, 4, 33);
round3Func(iD, iA, iB, iC, digest, block, 8, 11, 34);
round3Func(iC, iD, iA, iB, digest, block, 11, 16, 35);
round3Func(iB, iC, iD, iA, digest, block, 14, 23, 36);
round3Func(iA, iB, iC, iD, digest, block, 1, 4, 37);
round3Func(iD, iA, iB, iC, digest, block, 4, 11, 38);
round3Func(iC, iD, iA, iB, digest, block, 7, 16, 39);
round3Func(iB, iC, iD, iA, digest, block, 10, 23, 40);
round3Func(iA, iB, iC, iD, digest, block, 13, 4, 41);
round3Func(iD, iA, iB, iC, digest, block, 0, 11, 42);
round3Func(iC, iD, iA, iB, digest, block, 3, 16, 43);
round3Func(iB, iC, iD, iA, digest, block, 6, 23, 44);
round3Func(iA, iB, iC, iD, digest, block, 9, 4, 45);
round3Func(iD, iA, iB, iC, digest, block, 12, 11, 46);
round3Func(iC, iD, iA, iB, digest, block, 15, 16, 47);
round3Func(iB, iC, iD, iA, digest, block, 2, 23, 48);
}
fn round4(digest: *[16]u8, block: []u32) void {
round4Func(iA, iB, iC, iD, digest, block, 0, 6, 49);
round4Func(iD, iA, iB, iC, digest, block, 7, 10, 50);
round4Func(iC, iD, iA, iB, digest, block, 14, 15, 51);
round4Func(iB, iC, iD, iA, digest, block, 5, 21, 52);
round4Func(iA, iB, iC, iD, digest, block, 12, 6, 53);
round4Func(iD, iA, iB, iC, digest, block, 3, 10, 54);
round4Func(iC, iD, iA, iB, digest, block, 10, 15, 55);
round4Func(iB, iC, iD, iA, digest, block, 1, 21, 56);
round4Func(iA, iB, iC, iD, digest, block, 8, 6, 57);
round4Func(iD, iA, iB, iC, digest, block, 15, 10, 58);
round4Func(iC, iD, iA, iB, digest, block, 6, 15, 59);
round4Func(iB, iC, iD, iA, digest, block, 13, 21, 60);
round4Func(iA, iB, iC, iD, digest, block, 4, 6, 61);
round4Func(iD, iA, iB, iC, digest, block, 11, 10, 62);
round4Func(iC, iD, iA, iB, digest, block, 2, 15, 63);
round4Func(iB, iC, iD, iA, digest, block, 9, 21, 64);
}
fn rotateLeft(num: u32, bits: u5) u32 {
const upper = (num << bits) & 0xffffffff;
const lower = (num >> @intCast(u5, (32 - @intCast(u32, bits)))) & 0xffffffff;
return upper | lower;
}
fn round1Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);
const to_rotate = @truncate(u32, A + F(B, C, D) + X[k] + T[i]);
const rotated = rotateLeft(to_rotate, s);
var ret: u32 = 0;
_ = @addWithOverflow(u32, B, rotated, &ret);
writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round2Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);
const to_rotate = @truncate(u32, A + G(B, C, D) + X[k] + T[i]);
const rotated = rotateLeft(to_rotate, s);
var ret: u32 = 0;
_ = @addWithOverflow(u32, B, rotated, &ret);
writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round3Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);
const to_rotate = @truncate(u32, A + H(B, C, D) + X[k] + T[i]);
const rotated = rotateLeft(to_rotate, s);
var ret: u32 = 0;
_ = @addWithOverflow(u32, B, rotated, &ret);
writeWord(u32, digest[nA..].ptr[0..4], ret);
}
fn round4Func(nA: usize, nB: usize, nC: usize, nD: usize, digest: *[16]u8, X: []u32, k: usize, s: u5, i: u32) void {
const A: u64 = readWord(u32, digest[nA..].ptr[0..4]);
const B: u32 = readWord(u32, digest[nB..].ptr[0..4]);
const C: u32 = readWord(u32, digest[nC..].ptr[0..4]);
const D: u32 = readWord(u32, digest[nD..].ptr[0..4]);
const to_rotate = @truncate(u32, A + I(B, C, D) + X[k] + T[i]);
const rotated = rotateLeft(to_rotate, s);
var ret: u32 = 0;
_ = @addWithOverflow(u32, B, rotated, &ret);
writeWord(u32, digest[nA..].ptr[0..4], ret);
}
Whew. Not to crazy – or complicated – just a lot of repetition. Note the use
of @addWithOverflow
again.
Putting it All Together
With core MD5 algorithm out of the way we need to actually calculate the hash of something. Let’s do that now.
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
var alloc = gpa.allocator();
defer _ = gpa.deinit();
const msg = "message digest";
const msgDigest = try calcMD5(msg, alloc);
defer _ = alloc.free(msgDigest);
for (msgDigest) |ch| {
try std.io.getStdOut().writer().print("{x:0>2}", .{ch});
}
try std.io.getStdOut().writer().print("\n", .{});
}
But does this 💩 work? Let’s compare to a known good implementation md5sum
:
$ echo -n 'message digest' | md5sum
f96b697d7cb7938d525a2f31aaf161d0 -
$ zig run md5.zig
f96b697d7cb7938d525a2f31aaf161d0
Yay 🎉 it worked 🏆!
Conclusion
In this post we implemented the MD5 hashing algorithm in Zig. We wrote a very barebones unoptimized version. A future project for the reader could be improving the implementation to only allocate the padding in the final blocks of a hashed message.
In case you have issues reproducing/compiling the code here, then try downloading md5.zig and compiling.
NB: The code in this post’s text is licensed under the GNU Public License Version 3 or any later version at your choice.