Given how often it seems to occur, we perhaps have grown numb to hearing about how a company’s website got hacked and the perpetrator made off with users’ account information. High-profile breaches such as those against Marriott, Target, Sony, and Yahoo used to grab our attention. Sadly, such occurrences don’t register as loudly today because they seem more like an expectation than truly unexpected news.
What actually happens when a user breaks into a site and makes off with customer information? Let’s consider the most popular scenario. Suppose the hacker tricked a form on the website into divulging more information than it should have using an attack such as SQL injection. An SQL injection is the most popular way to compromise a website. To perform one, the hacker enters a specially formatted input into one of a form’s fields and clicks the “submit” button. This specially formatted input actually tricks the database that supports the website and holds all the users’ data into responding not only with one matching record but, in the worst case, with an entire table full of data. For example, a hacker could wage an SQL injection attack against a login page and get back an entire table of usernames and passwords. Bad news, right?
Well, it usually isn’t as bad as it sounds. Unless the website developer is so criminally foolish that they store the user’s names and passwords in plain text, such as appears to be the case recently for an electric utility, the hacker will still have a considerable amount of work to do to use the information they’ve retrieved. That’s because, for almost all reputable sites today, the passwords aren’t stored in plain text. Instead, they are stored as gibberish, a nonsensical value called a hash.
A hash is a very large number. It results from applying a complicated mathematical function to a piece of text. These mathematical functions are called hash functions, and common ones include MD5 and SHA-256. These hash functions treat the characters in the text as numbers and then perform a complex series of arithmetic operations to combine them, often in strange alternative number systems that algebraists can describe with a bit too much fondness. What results is a single, extremely large number called the hash. By extremely large, I mean a number with perhaps 300 digits. Instead of storing your password as plain text, it will store this 300-digit number. That is the prize the hacker takes when he breaks into a site and steals your account information.
What can the hacker then do with your password’s hash? Not much unless he does a lot more work. When you log into a website, the password is sent to the remote web server. The server computes the hash of the password you entered and compares it with your row of the username-password table to see if the hash it just computed matches the hashed password it has stored for you. If they match, you are allowed into the website. If they don’t, the site displays an error message that tells you that you failed to log in. You – or the hacker – are allowed to proceed only if you enter a password that gives rise to the same 300-digit hash value that is already on file for you.
Sounds implausible. Is it?
Hash functions have many remarkable properties. For instance, they are one-way, meaning that you can easily compute the 300-digit number that corresponds to a piece of text like a password, but you can’t recover that original text from that number. That’s a good thing, because it means that even the website owner doesn’t know what your actual password is, and they can’t figure it out either. That gives you some degree of privacy against the website owner, as they can’t log in on your behalf. And it gives the website owner plausible deniability if they are ever accused of logging into your account, because, mathematically speaking, they don’t know your actual password.
Another property of hash functions that isn’t so good, however, is that they are many-to-one. The fact is that many pieces of text – many different passwords, for example – hash to the same 300-digit number. In fact, theoretically infinitely many passwords have the same hash value. This gives the hacker a way in. The hacker doesn’t have to determine your exact password. They simply have to figure out another password that gives rise to the exact same hash value. There are infinitely many such proxy passwords. If they find just one, no matter how different it may look from your actual password, they’ll be allowed to enter the site on your behalf.
That is why stealing user account information from a website is valuable to a hacker. If they have a list of password hashes, they have something to target as they try to find pseudo-passwords that will allow them into the site.
They could try every password imaginable in the hopes of just getting lucky, but the size of the problem makes that intractable. Instead, they will use sophisticated techniques such as rainbow tables to help expedite the password cracking. Rainbow tables require a lot of computing power and tremendous amounts of memory to be successful, but the determined hacker is likely to have both. A website owner can make its stored passwords significantly more resistant to rainbow attacks by adding to each password a random string called a salt before hashing it, but not all websites do this. That’s when you need to worry.
So now you know what a hacker can do if he is able to compromise user account information on a website. They don’t immediately get the keys to your online kingdom, but they’ve at least learned some clues – specifically, the hash – that can help them find where they’re located. And that’s why you should still care.