During the course of this week, a number of high-profile websites (like LinkedIn and last.fm) have disclosed possible password leaks from their databases. The suspected leaks put huge amounts of important, private user data at risk.
What’s common to both these cases is the weak security they employed to “safekeep” their users’ login credentials. In the case of LinkedIn, it is alleged that an unsalted SHA-1 hash was used, in the case of last.fm, the technology used is, allegedly, an even worse, unsalted MD5 hash.
Neither of the two technologies is following any sort of modern industry standard and, if they were in fact used by these companies in this fashion, exhibit a gross disregard for the protection of user data. Let’s take a look at the most obvious mistakes our protagonists made here, and then we’ll discuss the password hashing standards that Mozilla web projects routinely apply in order to mitigate these risks.
A trivial no-no: Plain-text passwords
This one’s easy: Nobody should store plain-text passwords in a database. If you do, and someone steals the data through any sort of security hole, they’ve got all your user’s plain text passwords. (That a bunch of companies still do that should make you scream and run the other way whenever you encounter it.) Our two protagonists above know that too, so they remembered that they read something about hashing somewhere at some point. “Hey, this makes our passwords look different! I am sure it’s secure! Let’s do it!”
Poor: Straight hashing
Smart mathematicians came up with something called a hashing function or “one-way function”
H: password -> H(password). MD5 and SHA-1 mentioned above are examples of those. The idea is that you give this function an input (the password), and it gives you back a “hash value”. It is easy to calculate this hash value when you have the original input, but prohibitively hard to do the opposite. So we create the hash value of all passwords, and only store that. If someone steals the database, they will only have the hashes, not the passwords. And because those are hard or impossible to calculate from the hashes, the stolen data is useless.
“Great!” But wait, there’s a catch. For starters, people pick poor passwords. Write this one in stone, as it’ll be true as long as passwords exist. So a smart attacker can start with a copy of Merriam-Webster, throw in a few numbers here and there, calculate the hashes for all those words (remember, it’s easy and fast) and start comparing those hashes against the database they just stole. Because your password was “cheesecake1”, they just guessed it. Whoops! To add insult to injury, they just guessed everyone’s password who also used the same phrase, because the hashes for the same password are the same for every user.
Worse yet, you can actually buy(!) precomputed lists of straight hashes (called Rainbow Tables) for alphanumeric passwords up to about 10 characters in length. Thought “FhTsfdl31a” was a safe password? Think again.
This attack is called an offline dictionary attack and is well-known to the security community.
Even passwords taste better with salt
The standard way to deal with this is by adding a per-user salt. That’s a long, random string added to the password at hashing time:
H: password -> H(password + salt). You then store salt and hash in the database, making the hash different for every user, even if they happen to use the same password. In addition, the smart attacker cannot pre-compute the hashes anymore, because they don’t know your salt. So after stealing the data, they’ll have to try every possible password for every possible user, using each user’s personal salt value.
Great! I mean it, if you use this method, you’re already scores better than our protagonists.
The 21st century: Slow hashes
But alas, there’s another catch: Generic hash functions like MD5 and SHA-1 are built to be fast. And because computers keep getting faster, millions of hashes can be calculated very very quickly, making a brute-force attack even of salted passwords more and more feasible.
So here’s what we do at Mozilla: Our WebApp Security team performed some research and set forth a set of secure coding guidelines (they are public, go check them out, I’ll wait). These guidelines suggest the use of HMAC + bcrypt as a reasonably secure password storage method.
The hashing function has two steps. First, the password is hashed with an algorithm called HMAC, together with a local salt:
H: password -> HMAC(local_salt + password). The local salt is a random value that is stored only on the server, never in the database. Why is this good? If an attacker steals one of our password databases, they would need to also separately attack one of our web servers to get file access in order to discover this local salt value. If they don’t manage to pull off two successful attacks, their stolen data is largely useless.
As a second step, this hashed value (or strengthened password, as some call it) is then hashed again with a slow hashing function called bcrypt. The key point here is slow. Unlike general-purpose hash functions, bcrypt intentionally takes a relatively long time to be calculated. Unless an attacker has millions of years to spend, they won’t be able to try out a whole lot of passwords after they steal a password database. Plus, bcrypt hashes are also salted, so no two bcrypt hashes of the same password look the same.
So the whole function looks like:
H: password -> bcrypt(HMAC(password, local_salt), bcrypt_salt).
We wrote a reference implementation for this for Django: django-sha2. Like all Mozilla projects, it is open source, and you are more than welcome to study, use, and contribute to it!
What about Mozilla Persona?
Funny you should mention it. Mozilla Persona (née BrowserID) is a new way for people to log in. Persona is the password specialist, and takes the burden/risk away from sites for having to worry about passwords altogether. Read more about Mozilla Persona.
So you think you’re cool and can’t be cracked? Challenge accepted!
Make no mistake: just like everybody else, we’re not invincible at Mozilla. But because we actually take our users’ data seriously, we take precautions like this to mitigate the effects of an attack, even in the unfortunate event of a successful security breach in one of our systems.
If you’re responsible for user data, so should you.