Cracking a weak hash function
A few years ago, I found a job offer from a software company that required you to break a hash function in order to be able to apply. The hash function itself was pretty weak, and the constraints were sensible enough for the requirement to make sense. Since then, the company has taken the offer down, and as far as I know, they do not use the same exercise anymore, which is why I decided to post my solution.
I cannot find an archive page of the job offer description, so I’ll paraphrase from memory.
Given this hash function:
def hash(s): h = 7 letters = "acdegilmnoprstuw" for c in s: h = h * 37 + letters.find(c) return h
(notice that Python’s
.find() returns the index at which the character is found, or
-1 if it isn’t, like
indexOf does in other languages)
Find a 9-letter string of characters such that
string can only contain letters from
For this explanation, I will ask you to assume that I’m a 5 year old. No advanced math here, just pure logic that anybody can understand and follow.
If you look at the hash closely, you will notice that it starts with a fixed value
7, and that it then multiplies that value by
37 and adds some (negligible for now) offset. This means that, for every character there is in the string to be hashed, the value of
h will be multiplied by
37 and some negligible value, up to
15, will be added.
With this information alone, you know that, as long as the length of the input string increases, the resulting hash will also increase, i.e. the value of the hash is linearly proportional (somewhat) to the input string. We do know that the length will be
9, which means there will be a fixed amount of iterations in
The value that’s added after multiplying is simply an offset, a value from
15, and that’s the only thing that can change between different possible values to be fed into the
hash() function. That is, the smallest possible value we can get with these constraints is
hash("aaaaaaaaa") -> 909732178565539, and the biggest possible value is
hash("wwwwwwwww") -> 963882903480154.
A necessary consequence of all this is that
hash("bbbbbbbbb") < hash("bbbbbbbbc") < hash("bbbbbbbbd"). Another thing is that, because of the cumulative design of the hash, characters at the end will produce much smaller differences than characters at the beginning; that is,
hash("wwwwwwwww") - hash("wwwwwwwwa") will produce a much smaller difference than
hash("wwwwwwwww") - hash("awwwwwwww") will:
>>> hash("wwwwwwwww") - hash("wwwwwwwwa") 15L >>> hash("wwwwwwwww") - hash("awwwwwwww") 52687191808815L
With this knowledge, we now know that, as long as the string “increases” (from
wwwwwwwww), the value of the hash will increase as well, linearly. Also, each character has less and less impact on the final value of the hash as we progress over the string - i.e. the first character will make the biggest difference in value, while the last will make the least.
Therefore, the only thing necessary for this task is to progressively increase characters at the start of the string, and as soon as we surpass that value, we got the proper character (which is the previous one, as the further characters in the string will always add some numerical value to the final hash).
My code is at the end of this post1. However, the necessary steps can be very easily reproduced from this output:
uaaaaaaaa: too big (958906890920433) # first char is 'u'? no - too big, try something that would make the hash smaller taaaaaaaa: too big (955394411466512) # 't' makes the hash smaller, but it is still too big ... paaaaaaaa: too small (944856973104749) # the first letter is 'p' - move on to the next, because now the hash is too small puaaaaaaa: too big (946186019384611) ... praaaaaaa: too small (945901223753212) # the next one is 'r' - move on to the next pruaaaaaa: too big (945937143922938) ... proaaaaaa: too small (945924315290893) # next one is 'o' - move on to the next ...
/r/hackcasual on reddit pointed out that you don’t even need to guess like this. That is, for the expression:
a * 37 + x = b
you can solve for
x with modulus even though you only know
b. The reasoning, I quote, is:
Let’s say I give you an expression a * 37 + x = b. Even though you only know b, you can solve for x with modulus. Think of it like this, let’s say I give you a * 10 + x = 74. As long as a and x are integers, you know x must be 4.
def crackHash(hv): while hv > 37: print(hv % 37) hv /= 37
that cracks it in reverse order