Choosing a hash function for coinflips in A/B testing

26 March, 2024
by Mark Teisman

A common way to do the coinflip in A/B testing or online controlled experiments, is to hash the user identifier and a seed that is associated with the experiment instance.

There are many algorithms that could be used to produce this hash. The main functional requirements for the hashing functions are

The implementation should produce outputs that are uniformly distributed across the output space.
The implementation should be deterministic, in that for a given input it should always arrive at the same output.
The implementation should produce an avalanche effect, where changing one bit of the input should have a 50% chance of flipping each bit in the output.
The implementation should be performant.

Then there are some optional considerations. For instance, is it critical that, when used across different architectures, the hash function produces the same output? With different architectures - think of the different word sizes (32-bit architecture versus 64-bit architecture), but also endianess (Big-endian versus Little-endian) - the outputs of a given hash function may be different. Here, in general I think it's fair to choose an implementation that delivers consistent hashes across architectures. This to make sure you don't place any constraints on the compatibility of clients you have now or in the future.

With regards to which options to pick from, here's some options you could consider.

From this list, I would probably pick xxHash, which is highly performant and meets all previously stated requirements. xxHash passes the SMHasher test suite, which include Performance Tests, Differential Tests, Avalanche Tests and more.