Each year, the Open Security Foundation gathers information from the offices of Attorneys General and consumer protection agencies all over the US about data breaches. The information they publish in the statistics section at DataLossDB.org is startling. 2012 aims to be a record-setting year for the loss of Personally Identifying Information (PII). Of the reported corporate and government data breaches this year, 61 percent of them are the result of hacking attempts. The rest are the result of fraud, stolen laptops and such.

In one of the more highly publicized breaches of 2012, the professional/social networking site LinkedIn.com site lost more than 6 million sets of user credentials to hackers. With the minimal cost of recovery per incident around $60US, the company could be liable for as much as $387 million. The costs could go much higher as the dominoes fall, so to speak. Security breaches tend to yield higher and higher damages as hackers use the spoils of one victory to wage even more nefarious campaigns against their victims.

What most people don't know about the LinkedIn.com breach of June 2012 is that no clear-text passwords were actually stolen by the hackers. The developers at the social networking company were smart enough to hash the passwords before storing them in their database. Unfortunately, their cleverness ended there. The LinkedIn.com developers violated some of the basic, best practices when handling the credentials of end users. And they'll likely pay the price for it.

What the criminals did steal from LinkedIn.com, however, were the unsalted, single-hash values of more than 6 million passwords. For a hacker that knows what to do with such data, many of the passwords were simple enough to figure out. What's an unsalted, singly-hashed password, you ask? And why isn't simple hashing good enough? Those are good question to ask. Before I dive into the specifics of the best practices for managing passwords, let's do a little primer on encryption, hashing and key strengthening.

What is encryption and what is hashing?

Encryption involves changing messages so that they cannot be understood by parties that don't have an appropriate key. Some encryption uses what are known as symmetric keys. That word symmetry comes from ancient Greek and it literally means "measured together." That's a great way to remember that symmetric key encryption involves using the same key for both the encryption step and the decryption step. The difficult part about symmetric key encryption is making sure that both the encrypter and the decrypter fetch and manage the shared key in a secure way. If the key were ever compromised during symmetric encryption, listeners could understand the entire conversation.

Asymmetric encryption, on the other hand, uses one key for encoding messages and a different one for decoding them. It's typically slower than symmetric encryption, taking more time and processor power to do the work. However, asymmetric encryption has the advantage that both parties can use different keys. Moreover, if one party keeps the decryption key completely private, the key used for encryption can be shared with anyone. This practice is sometimes referred to as public key encryption for that very reason. Public key encryption allows two parties to have a private conversation in a crowded room of potential listeners, like the Internet.

Hashing is not encryption because the hashed phrase is essentially the key. In fact, hashing can't be used to have a private conversation because the original messages would get "shredded up" (or hashed) in the process. For this reason, you may hear cryptographic hashing referred to as one-way hashing sometimes. The idea of hashing involves converting a message or phrase into a relatively small numeric value. For example, the following line of SQL code would convert the string password into a number.

SELECT HASHBYTES('SHA1', 'password')

Using Microsoft SQL Server, this statement would produce a 160-bit integer using the SHA1 hashing function. For a single word input, that seems like a big number to produce but SHA1 will produce a similarly sized number for any input. You could hash all the works of Shakespeare at once using the SHA1 function and it would still produce a single, 160-bit integer in response. However, if you re-supplied all of the works of that famous bard to the hashing function again, changing just one word, the integer that pops out would almost certainly be different. This quality of hashing functions to produce massively different results with small changes to the input make them ideal for validating passwords.

Are some hash functions better than others?

Lots of cryptographic hashing functions exist. In the tiny example shown above, SQL Server's HASHBYTES function is used to perform a SHA1 hashing of a single word. In that example, I could have chosen to use the so-called MD2, MD4, MD5 or SHA algorithms instead. Each of those is built into Microsoft's product, too. If you have never done cryptographic hashing before, you may be asking why you should use one of those algorithms and not the others.

As it turns out, all of the built-in hashing functions in Microsoft SQL Server are poor choices for singly-hashed phrases. In other words, if you're going to hash an input value just once and store it, all of the built-in hashing algorithms in SQL Server are pretty weak. Computer scientists have cracked the MD2, MD4, MD5, SHA and SHA1 algorithms using very powerful computers in recent years. Having said that though, hashing with these algorithms still has the advantage that the original message is typically not derivable from the hash value. Going back to the Shakespeare example cited earlier, how would it be possible to produce the entirety of Shakespeare's work from a single, 160-bit integer? If that were possible, the hashing algorithm might better serve as a data compression tool from a commercial perspective.

What is possible with the publicized exploits of these hashing algorithms is the elimination of message phrases that must be tested. This is an important bit of understanding that leads to one of the key recommendations I'll outline later on. In the meantime, I'll answer the question that's most likely on your mind at this point: is it appropriate to use any of the hashing functions built into Microsoft SQL Server for hashing passwords and other information. The answer is yes, but only if you follow the other best practices outlined is this article. When I reveal to you how LinkedIn.com was hacked in a few moments, you'll understand why I can make this claim.

What is key stretching?

Key stretching can make a weak password feel stronger to hackers. I used the word feel because nothing can really make a weak password strong. But the way in which the pass phrase is applied during encryption or hashing can make it more difficult for the hacker to undo the obfuscation of the original message. This can significantly lengthen the time required to crack the code. The most common form of key stretching is the repeated folding in of the encrypted text, pass phrases or hash values back into the algorithm. This is often done in conjunction with a so-called salt value that further complicates the cracking process by introducing new ingredients during each iteration. I like to use a cooking metaphor to help understand how key stretching strengthens the cryptography process.

Imagine a bowl of flour from which we want to make pancakes. The flour will serve as the original message in the metaphor. To make pancakes, we'll need eggs and milk folded into the flour. If we used only one stroke to mix in the other ingredients, the batter wouldn't be consistent. An observer would be able to look at the bowl to discern the flour, the eggs and the milk. After many strokes however, the original ingredients would be blended together making it much more difficult to distinguish the constituent parts. In a sense, the blending process stretches the other ingredients through the flour to form a highly consistent batter. Given sufficiently sophisticated chemistry and time, it may be possible to undo the mixing of the ingredients to recover the original bowl of flour. But it would be very expensive to do so, as you can imagine. Cryptographic key stretching is very much like making batter. By repeatedly stroking the mixture into itself and adding new ingredients along the way, the task of undoing the obfuscation becomes much more complex.

So why was the LinkedIn.com hack successful?

I've already mentioned that LinkedIn.com didn't lose passwords in the June 2012 attack. They lost cryptographic hashes of passwords. So why was the attack so damaging? It's not as if the hackers could impersonate the LinkedIn.com subscribers directly with the information that they obtained. By comparison, the Yahoo Voices site was hacked just five weeks after LinkedIn.com was breached. In that incident, the clear text passwords of nearly half a million users were exposed on the Internet. Yahoo hadn't even bothered to hash the passwords before storing them in their database. Using Yahoo's victim list, there was literally no work to be done to impersonate any of the users named within. The developers at LinkedIn.com at least had the courtesy to hash the passwords before saving them. So why was the LinkedIn.com attack successful for the hackers? There are three reasons:

  1. The developers did not use key stretching with salting in their hashing scheme.
  2. End users do a really lousy job of picking passwords on their own.
  3. The policy at LinkedIn.com does not require that users choose better passwords.

To make this clear, executing the SELECT statement shown earlier to produce the SHA1 hash of the word password, you'll see that it produces this integer:

0x5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8

That may look enough like gibberish that it's not discernable as the pass phrase password. But searching through the file of compromised LinkedIn.com credentials, this big integer will undoubtedly be found hundreds or thousands of times. That's because some portion of the population will always use the wordpassword as their password if given the chance to do so. Similarly, hashing common names like Donna or Kevin or Fido with SHA1 will produce specific integers for which you could scan the millions of credentials in the file for more matches.

Lo and behold, expanding this sort of dictionary attack to many common words and names, a significant number of matches will begin to appear in such a large population. Skilled hackers would use techniques much more sophisticated than this to uncover many of the original passwords from the file. Once they've cracked some of the accounts, they often share the passwords en masse with their buddies and the damage from the problem starts to cascade. The effects of an individual breach are easily magnified if the targeted user uses the same e-mail address and password scheme at other popular Internet sites. Moreover, with such intimate knowledge of ones credentials, phishing attacks become potentially more effectivetive, too.

Best Practice #1 - Never Store Passwords, Encrypted or as Clear Text

It goes without saying that storing clear text password in your database is a bad idea. I would never do business with a company that operated that way. In fact, immediately after establishing a new account with an Internet-based company, I will always tell them that I've lost my password. If the company mails me my original password, I know that they are storing it in their database instead of hashing it. That triggers me to cancel my account and write a letter of complaint to the company.

It's tempting to say that encryption of the passwords in the database is good enough. But this brings to mind all of the engineering choices and processes that must be developed for key management. If you choose symmetric key management, someone has to escrow and protect the shared key from discovery outside the database. If you use asymmetric encryption, there are similar concerns for managing the private key that will be used during decoding.

Moreover, the existence of the original passwords in the database, encrypted or otherwise, leads to all sorts of concerns about disaster recovery, archiving and off-site storage. Hashing of passwords is the better choice because it eliminates all of these concerns. Use other static user-specific values, e.g. the customer number or their birth date, as the salt and do key stretching to blend the pass phrase and the salt into the batter.

On a final note concerning production and storage of hash values, consider using a stronger hashing algorithm than the ones built into your database product. Third party libraries that perform hashing with algorithms like SHA2 (now required by the US government), Whirlpool and RIPEMD-320 are not expensive in the grand scheme of things. Make the investment and buy a good cryptography library. If your database is ever compromised, you'll be in a much better position having done so. These newer hashing algorithms may still be cracked in the future but they are provably much more difficult to break.

Best Practice #2 - Segment User Credentials in the Database

Peeking at the DataLossDB.org statistics mentioned at the start of this article, you'll notice that 24 percent of the attacks leading to data loss in 2012 come from inside corporate networks. This is a trend that's growing and not likely to slow down any time soon. The best way to protect your company from malicious or careless employees is to segment any credentials data from the rest and to severely limit access to a single user identity.

Start with schema separation. I recommend splitting hashed passwords and your user-specific salt values into a separate schema that's locked down. The underlying tables should be write-only, meaning that the information stored within them should never be directly queryable. Other than writing changes into the schema, which should be a highly privileged operation, the only other function that should be allowed is the testing of pass phrases to see if they match. The key stretching process and the specific salt values used in the process should be hidden inside encrypted functions and procedures that your application developers can call to obtain the necessary privilege. Only the architect and a few DBAs should know how this service works. The developers should see password management as a black box buried and protected deep within your database.

Best Practice #3 - Maximize Auditing on Credentials Data

With respect to auditing, we tend to minimize logging in production environments to avoid the maintenance costs of storing so much data. However, when it comes to passwords, you should never scrimp on auditing and metadata. You may have noticed that on some websites, after entering an incorrect password, the site will respond by saying something like, "You changed this password 3 months ago." Some of Google's sites do this now, for example. This is great information but it could also be used by a hacker to refine their attack. So I don't recommend revealing that sort of information during login.

What such hints reveal though is that the site is storing metadata about the management of user credentials. That's a good thing because such information makes it possible to support your users better. Moreover, the audit log for password changes and use can be mined for interesting trends like frequency, complexity, invalid credential attempts, etc. Be sure not to store personally identifying information in the audit log, of course.

Best Practice #4 - Help Your Users Choose Better Passwords

Users are really bad at choosing strong passwords. As demonstrated in the LinkedIn.com attack, weak passwords were the key to thwarting the company's admittedly insufficient cryptographic shield. Password policy should specifically disallow passwords that are prone to dictionary-based attacks. One way to do this is to compare passwords to a list of common words and names in a variety of languages. Universities around the world have made such lists available as part of their open source resource projects. During user registration or password reset, any match against the known word list should simply be disallowed. Morever, the introduction of requirements for using mixed case, mixed alphanumeric and punctuation characters into a sufficiently long pass phrase can geometrically increase the complexity of algorithms that will be necessary to crack them.

Summary

This article was written to give business users a basic understanding of the principles of cryptography, password policy and the discipline required to keep their customers safe. Don't let your company become a victim of bad password management practices. It's easy to make mistakes that could cost you millions of dollars. Furthermore, make sure that your software and database developers get the training they need to protect your company. Cryptography isn't trivial to do correctly and should never be left to novices who often make incorrect assumptions about what really works to protect information. You owe it to your shareholders and investors to train those developers using the very best practices and proven cryptographic libraries.