Let’s talk about password storage

Fred Wenzel

13

During the course of this week, a number of high-profile websites (like LinkedIn and last.fm) have disclosed possible password leaks from their databases. The suspected leaks put huge amounts of important, private user data at risk.

What’s common to both these cases is the weak security they employed to “safekeep” their users’ login credentials. In the case of LinkedIn, it is alleged that an unsalted SHA-1 hash was used, in the case of last.fm, the technology used is, allegedly, an even worse, unsalted MD5 hash.

Neither of the two technologies is following any sort of modern industry standard and, if they were in fact used by these companies in this fashion, exhibit a gross disregard for the protection of user data. Let’s take a look at the most obvious mistakes our protagonists made here, and then we’ll discuss the password hashing standards that Mozilla web projects routinely apply in order to mitigate these risks.

A trivial no-no: Plain-text passwords

This one’s easy: Nobody should store plain-text passwords in a database. If you do, and someone steals the data through any sort of security hole, they’ve got all your user’s plain text passwords. (That a bunch of companies still do that should make you scream and run the other way whenever you encounter it.) Our two protagonists above know that too, so they remembered that they read something about hashing somewhere at some point. “Hey, this makes our passwords look different! I am sure it’s secure! Let’s do it!”

Poor: Straight hashing

Smart mathematicians came up with something called a hashing function or “one-way function” H: password -> H(password). MD5 and SHA-1 mentioned above are examples of those. The idea is that you give this function an input (the password), and it gives you back a “hash value”. It is easy to calculate this hash value when you have the original input, but prohibitively hard to do the opposite. So we create the hash value of all passwords, and only store that. If someone steals the database, they will only have the hashes, not the passwords. And because those are hard or impossible to calculate from the hashes, the stolen data is useless.

“Great!” But wait, there’s a catch. For starters, people pick poor passwords. Write this one in stone, as it’ll be true as long as passwords exist. So a smart attacker can start with a copy of Merriam-Webster, throw in a few numbers here and there, calculate the hashes for all those words (remember, it’s easy and fast) and start comparing those hashes against the database they just stole. Because your password was “cheesecake1″, they just guessed it. Whoops! To add insult to injury, they just guessed everyone’s password who also used the same phrase, because the hashes for the same password are the same for every user.

Worse yet, you can actually buy(!) precomputed lists of straight hashes (called Rainbow Tables) for alphanumeric passwords up to about 10 characters in length. Thought “FhTsfdl31a” was a safe password? Think again.

This attack is called an offline dictionary attack and is well-known to the security community.

Even passwords taste better with salt

The standard way to deal with this is by adding a per-user salt. That’s a long, random string added to the password at hashing time: H: password -> H(password + salt). You then store salt and hash in the database, making the hash different for every user, even if they happen to use the same password. In addition, the smart attacker cannot pre-compute the hashes anymore, because they don’t know your salt. So after stealing the data, they’ll have to try every possible password for every possible user, using each user’s personal salt value.

Great! I mean it, if you use this method, you’re already scores better than our protagonists.

The 21st century: Slow hashes

But alas, there’s another catch: Generic hash functions like MD5 and SHA-1 are built to be fast. And because computers keep getting faster, millions of hashes can be calculated very very quickly, making a brute-force attack even of salted passwords more and more feasible.

So here’s what we do at Mozilla: Our WebApp Security team performed some research and set forth a set of secure coding guidelines (they are public, go check them out, I’ll wait). These guidelines suggest the use of HMAC + bcrypt as a reasonably secure password storage method.

The hashing function has two steps. First, the password is hashed with an algorithm called HMAC, together with a local salt: H: password -> HMAC(local_salt + password). The local salt is a random value that is stored only on the server, never in the database. Why is this good? If an attacker steals one of our password databases, they would need to also separately attack one of our web servers to get file access in order to discover this local salt value. If they don’t manage to pull off two successful attacks, their stolen data is largely useless.

As a second step, this hashed value (or strengthened password, as some call it) is then hashed again with a slow hashing function called bcrypt. The key point here is slow. Unlike general-purpose hash functions, bcrypt intentionally takes a relatively long time to be calculated. Unless an attacker has millions of years to spend, they won’t be able to try out a whole lot of passwords after they steal a password database. Plus, bcrypt hashes are also salted, so no two bcrypt hashes of the same password look the same.

So the whole function looks like: H: password -> bcrypt(HMAC(password, local_salt), bcrypt_salt).

We wrote a reference implementation for this for Django: django-sha2. Like all Mozilla projects, it is open source, and you are more than welcome to study, use, and contribute to it!

What about Mozilla Persona?

Funny you should mention it. Mozilla Persona (née BrowserID) is a new way for people to log in. Persona is the password specialist, and takes the burden/risk away from sites for having to worry about passwords altogether. Read more about Mozilla Persona.

So you think you’re cool and can’t be cracked? Challenge accepted!

Make no mistake: just like everybody else, we’re not invincible at Mozilla. But because we actually take our users’ data seriously, we take precautions like this to mitigate the effects of an attack, even in the unfortunate event of a successful security breach in one of our systems.

If you’re responsible for user data, so should you.

13 responses

  1. Rudloff wrote on ::

    Don’t you think one good thing would be to update HTTP Digest Auth to use a better algorithm than MD5 ?

  2. Pingback from Let’s talk about password storage | fredericiana on ::

    [...] talk about password storage Note: This is a cross-post of an article I published on the Mozilla Webdev blog this [...]

  3. Justdave wrote on :

    @Rudloff: nobody has attempted to tackle strengthening the HTTP Auth encryption because there’s not much point. You can use plaintext HTTP Auth (rather than MD5 digest) if you want, as long as you do it over SSL. Not using SSL on a connection that requires authentication is pretty pointless, since it can be man-in-the-middled, whether your password itself is encrypted or not. So the real work goes into strengthening SSL.

  4. Laurian Gridinoc wrote on ::

    Isn’t Mozilla Persona a single point of failure? (think security and also QoS)

    Why not have the browser enforce unique passwords, if I signup on Twitter with the password 12345, the browser could salt/hash/etc it.
    Also when signing up to Facebook later the browser should not allow me to use a password I’ve already used on another domain.

  5. Stephanie Daugherty wrote on ::

    @Laurian Gridinoc – There’s something along these lines already available in the form of a couple Firefox addons. https://addons.mozilla.org/en-US/firefox/addon/password-hasher/ is one. The idea is that the name of the site you are giving the password to is used as a salt, inherently making the password unique between sites and expanding the search space for someone trying to brute force it. Not foolproof, but it does buy you some time if one site gets hacked.
    Not really a security guru or cryptographer, but from casual inspection, it probably falls somewhere square in the middle of typical passwords and cryptographically derived random passwords as far as security and resistance to brute force.

  6. Dan Callahan wrote on ::

    Hi Laurian,

    I’m Dan, and I work on Persona. Whether or not Persona is a single point of failure is a great question, and one that we’re trying hard to address. Fundamentally, Persona is built atop a distributed protocol that doesn’t rely on any central servers or authority.

    Think of it like this: With Persona, you’ll be able to sign in to any supporting website by signing in to your email provider and passing along proof that you were able to do that. It gives you the ability to say to a website “Hi, I’m dcallahan@mozilla.com, and here’s the proof.”

    To do that transaction completely autonomously, you only need a browser that supports Persona and an email provider that speaks the BrowserID protocol. But here’s where things get tricky: No browser understands Persona (though we’re working on it for Firefox 17!), nor do most email providers understand the BrowserID protocol.

    So, what do we do?

    As a fallback, we have a javascript library at browserid.org that takes care of Persona support on browsers that can’t do it themselves. If that site goes down, those users won’t be able to even attempt to log in to sites using Persona. We also host a service at that same site that signs credentials for users whose email providers don’t understand the BrowserID protocol. For example, until gmail.com speaks the BrowserID protocol, Gmail users will have their credentials validated by our fallback at browserid.org.

    This second bit *is* a single point of failure in terms of security: if browserid.org gets compromised, then attackers will be able to masquerade as Persona users on other websites. All of our code is open source and subject to peer review at https://github.com/mozilla/browserid, and we can also mitigate the damage of a breach by closing the hole and invalidating the cryptographic keys that we’re using at browserid.org.

    But remember, everything at browserid.org (soon to be login.persona.org) is a fallback. If your email provider speaks the BrowserID protocol (or if you add support to your own domain), then you’re no longer at risk by a potential compromise at browserid.org. You get to choose who you trust, and, if you want, to take direct control of the security around your identity. Awesome!

    …But doesn’t that make your email provider a single point of failure? Yes, but it *already is.* If someone compromises your email address, it’s trivial to gain access to your accounts elsewhere through their “forgot password” links. The difference with Persona is that *those* sites are no longer storing any sensitive authentication information, so, say, a LinkedIn leak won’t put you at risk elsewhere.

    Lastly, as to your question about having the browser proactively help users select and use better passwords, that’s an active focus of Paul Sawaya’s Watchdog project. :)

    If you have any questions, please let me know! You can find the Identity team on irc.mozilla.org in #identity, or on our mailing list at https://lists.mozilla.org/listinfo/dev-identity.

  7. anon wrote on :

    Oh wow. This single post FINALLY made me understand the concept behind Persona. Somehow all the posts on the BrowserID blog (and following the Planet Mozilla since like a year) didn’t help.
    Thx a lot, Dan!

  8. Storage Organizers wrote on ::

    MD5 is pretty loose there are free programs out there to decrypt it

  9. Nicholas Nethercote wrote on :

    Dan, I also don’t recall any of the BrowserID blog posts explaining things this nicely :)

  10. Chris wrote on :

    Why is the HMAC-strengthened password necessary? Why not just bcrypt(password, user_salt + system_nonce)?

  11. Fred Wenzel wrote on ::

    bcrypt’s salt has a fixed length, which is why bcrypt ships with its own gen_salt function: https://github.com/fwenzel/python-bcrypt/blob/6a7fb96/bcrypt/bcrypt.py#L48

  12. Amr wrote on ::

    The challenge is not to crack this method or not, the challenge is how to package this technique in a way that would make reusable by other websites. There should be some easy-to-use solution for how to store passwords that would make it very difficult for website to pick up quickly and use. This will make the Internet better, this is part of Mozilla’s mission.

  13. Pingback from Peek Inside | TechSNAP 63 | Jupiter Broadcasting on ::

    [...] Let’s talk about secure password storage [...]