Deciding Which Hashing Algorithm to Use

I've written a few authentication routines in my career and I've made sure to always hash sensitive user information. The thing is: I usually Googled which hashing algorithm to use and asked friends' advice. Turns out there's more than just bcrypt... and now I know when and why to choose something different.

I just finished up the final video of the CS Basics video series and I learned a ton.

CS Basics

Crypto is a major weakness of mine and a subject I’ve put off learning about for ages. I’ve spent a few months with it now and it’s so much fun to learn about - specifically hashing. Here’s what I found out - and if you’re interested, this video is part of a 6+ hour video series I created to go along with The Imposter’s Handbook.

Gently back away from the cipher please…

Cryptography is one of those subjects, to me anyway, that you can learn just enough about to slowly back away, trusting other people who have made disastrous mistakes.

For instance: I know that you should ensure that your auth library is hashing passwords instead of just encrypting them. That hashing algorithm should not be SHA-1 or MD5 and, if possible, don’t roll your own auth. I’ve broken that last one a few times, however.

I’m OK with these guidelines as I think are most developers. When it comes to hashing, however, I think most of us just roll with bcrypt:

bcrypt was designed by Niels Provos and David Mazières based on the Blowfish cipher: b for Blowfish and crypt for the name of the hashing function used by the UNIX password system.

The article linked is from Auth0 and does a good job breaking down why bcrypt is a good choice for hashing… however I run into a problem that I often run into when reading things about crypto: you get lost in the details very quickly.

For example (from same article):

The Blowfish cipher is a fast block cipher except when changing keys, the parameters that establish the functional output of a cryptographic algorithm: each new key requires the pre-processing equivalent to encrypting about 4 kilobytes of text, which is considered very slow compared to other block ciphers. This slow key changing is beneficial to password hashing methods such as bcrypt since the extra computational demand helps protect against dictionary and brute force attacks by slowing down the attack

Ummmm 🤷🏽‍♀️…

The weaknesses of hashing algorithms

There are two primary ways to break a hash:

  • Causing a collision, which means you create the exact same hash for two different bits of data
  • Using a pre-image, which involves guessing some values and hashing them in an attempt to match a different hash.

We care about collisions in general, but (from what I understand) not when it comes to storing user information. What we care about there is the pre-imaging attack.

If you don’t know what that is, have a look at this Google link:

Password 123

Clever hackers have “pre-imaged”, or “pre-hashed” if you will, loads of common passwords and also credit card numbers. They can sidestep, entirely, your hashing algorithm by simply running a match to their pre-imaged data.

In fact, they’ve put it all online. They’re called “rainbow tables” and you can download a whole mess of hashes if you want.

The point: how to avoid the attack

Let’s assume that your data will be compromised. At some point in your career this will likely happen. You might not be the one responsible, but data that you’ve stored in a database will end up in the hands of a bad person. This has happened to me twice.

When that happens we hope and pray that our hashing algorithms are strong enough (or hardened) to withstand the brute force attacks coming at them. This is why we need to be sure our hashing algorithm choice is carefully thought through.

First line of defense: salts

You probably know this much already but if you append a random value to what you’re hashing you can thwart a brute force attack pretty easily. This might not always work, however, because if you’re careless and append the same random value and hard code it in your code… you’re done.

Let’s say you keep that salt secret. It’s also possible to brute force the salt as well, generating 32 bits of random bytes (for instance) and appending to a rainbow table value. That’s a lot of computation, but it turns out that some of these hackers have pretty sophisticated hardware:

in theory, if an attacker is up to crack all your passwords by bruteforce, unless he’s insane, he also has the resources to do so (no laptop allowed here :) ) and therefore he would use an ASIC (or GPU rigs) or FPGA hardware to do so. ASIC and FPGA are just very optimized hardware to accomplish ONE task; an ASIC is a static circuit (meaning that the hardware is STRICTLY built for that algorithm), a FPGA can be reprogrammed.

What you and I think might be a lengthy cracking operation might take very little time at all!

Choosing the right algorithm

Let’s now assume that our hacker is aware of your salt, can potentially guess it, and is about to fire up their special hardware to brute force your users’ passwords.

We still have some tricks we can use.

If you, like me, normally use bcrypt then we’re reasonably covered. bcrypt is a slow hashing algorithm which uses CPU time to generate the hash. It might seem counterintuitive to have a slow hashing process but for your user, it’s nothing. And in all likelihood your login routine is not going to happen all that often:

const bcrypt = require('bcryptjs');
//10 rounds of salt - add more to slow things down
const salt = bcrypt.genSaltSync(10);
let hash = bcrypt.hashSync("Password123", salt);

This is going to severely slow down our hacker! They can’t use rainbow tables directly any more since you’ve salted your passwords, so each guess they make must be processed using this slow hashing algorithm. It’s possible this could mean billions (or even trillions depending on your choices) of calculations that each take 0.5ms. That’s way too long.

Or not. If our hacker can parallelize the hacking process (spreading it out over multiple CPUs/GPUs) they can cut this time down dramatically.

That’s where other algorithms come in - specifically scrypt and PBKDF2. scrypt, in particular, allows you to specify a “cost” when you hash a value. The higher the cost, the more CPU and RAM the process takes:

const hashPassword = function(pw){
  return new Promise(function(resolve, reject){
    crypto.scrypt(pw,"salty-secret", 32, {cost: 2**14}, function(err, buffer){
      if(err) reject(err);
      else resolve(buffer.toString("hex"))
    });
  })
}

That’s what we’re trying to do here: make the brute force attack a costly one. PBKDF2 doesn’t have the same resilience to a parallelized attack as bcrypt and scrypt. bcrypt works pretty well and has been around forever - but scrypt is newer and, if you use node, is baked right into the crypto library. The way you can tune the cost of both CPU and RAM cycles is super useful as well.

OK but really: which one?

Honestly it comes down to what you need to hash. From what I’ve read while researching these videos - bcrypt still does a solid job of hashing AND will ensure that you have a salted password. You can also tune the CPU load by upping the “rounds” - a value you set when creating the hash. With the Node library (bcryptjs - which I like as it doesn’t have native dependencies) the default is 10.

If I had to choose one for my next project, I’d probably go with scrypt. I like the idea that you can flex CPU and RAM (mitigating threaded attacks) and that it’s built into Node.

But honestly - it’s probably best to do some reading on this for yourself. What you see here is the result of my digging in and trying to learn a bit more. Don’t regard me as an expert!