What The LinkedIn Hack Taught Us About Storing Passwords

In June 2012, an anonymous hacker posted a list of 6.5 Million encrypted passwords belonging to LinkedIn users on a Russian hacker forum. It was soon discovered that these passwords were hashed using an outdated and vulnerable hashing algorithm - and were also unsalted. The lawsuits followed suit shortly… what is 'hashing' and 'salting', and can we trust big organizations to keep our secrets safe?

Hosted By

Ran Levi

Exec. Editor @ PI Media

Born in Israel in 1975, Ran studied Electrical Engineering at the Technion Institute of Technology, and worked as an electronics engineer and programmer for several High Tech companies in Israel.
In 2007, created the popular Israeli podcast, Making History, with over 14 million downloads as of Oct. 2019.
Author of 3 books (all in Hebrew): Perpetuum Mobile: About the history of Perpetual Motion Machines; The Little University of Science: A book about all of Science (well, the important bits, anyway) in bite-sized chunks; Battle of Minds: About the history of computer malware.

What The LinkedIn Hack Taught Us About Storing Passwords

In June, 2012, an anonymous hacker posted a list of 6.5 Million encrypted passwords on a Russian hacker forum. They were posted there, apparently, to crowdsource the cracking process. Members of this forum, who obviously had a lot of experience dealing with hacking and passwords, played around with the published passwords and cracked some of them. They discovered that many of the cracked passwords contained the word “LinkedIn”. Since many people use the name of the website they’re registering to as part of their chosen password – not a good idea, obviously – many hackers suspected that the posted passwords were related to Linkedin. 

A few days after the reveal of the alleged passwords leak, Linkedin tweeted: 

“Our team continues to investigate, but at this time, we’re still unable to confirm that any security breach has occurred. Stay tuned here.”

However, in a post on their blog, on the very same day and concerning the same alleged incident, Linkedin representatives already changed their language:

“It is worth noting that the affected members who update their passwords and members whose passwords have not been compromised benefit from the enhanced security we just recently put in place, which includes hashing and salting of our current password databases.”

That was at least some kind of recognition on the part of LinkedIn that password protection measures needed to be improved. 

The Lawsuits

Following the attack, two lawsuits were filed against Linkedin – one in June 2012 and the other in November. The one from November was actually an amended version of the one from June, so we’ll focus only on the November one. 

It was filed on behalf of two paying Linkedin users – Premium users, in LinkedIn terms – who acted as class representatives for all LinkedIn premium users supposedly affected by the breach. The lawsuit asked for “injunctive and other equitable relief,” plus restitution and damages for the plaintiffs, alongside all other members of the class: 5 Million dollars in all. The lawsuit alleged that the way LinkedIn protected users’ information was not sufficient, even though LinkedIn’s privacy policy clearly stated that –

“Personal information you provide will be secured in accordance with industry standard protocols and technology. “

The plaintiffs alleged that Linkedin’s security protocol at the time of the breach was old and weak. As the lawsuit stated:

“The problem with this practice is two-fold. First, SHA-1 is an outdated hashing function, first published by the National Security Agency in 1995. Secondly, storing users’ passwords in hashed format without first ‘salting’ the password runs afoul of conventional data protection methods, and poses significant risks to the integrity of users’ sensitive data.”

The plaintiffs argued that if they only knew that Linkedin was using substandard encryption – then they wouldn’t have paid for the premium account. 

“When signing up for and purchasing a ‘premium’ account, Plaintiffs and the members of the Class relied on LinkedIn’s representation that it uses ‘industry standard protocols and technology’ to preserve the integrity and security of their personal information”.

The 2012 lawsuit against Linkedin kept repeating the phrases “industry standards” and “common standards” to make the point that Linkedin wasn’t using them at the time. But in reality, password protection standards weren’t that clear in 2012. The plaintiffs were probably aware of that, and so the lawsuit doesn’t refer to any specific set of standard rules or even recommendations, and is using the word “standards” in a rather vague and general way.

In fact, in 2012, the only set of rules that one can think of as a “standard”, was already 8 years old. It’s a list of recommendations titled “Electronic Authentication Guideline”, published in 2004 by NIST – The National Institute of Standards and Technology. 

There’s only a handful of paragraphs there that deal with passwords, specifically with the vulnerability of passwords chosen by humans, as opposed to randomly generated ones, which are, usually, a lot more difficult to crack. Hashing is also mentioned in the document – but just as a general way of protecting passwords. No specific algorithms are referred to. The practice of salting, mentioned in the lawsuit – is not to be found anywhere in the NIST 2004 guidelines. 

To summarize: in 2012 there weren’t any real password keeping standards to abide by. However, even the little effort that linkedin did try to make, to protect their users’ passwords, was evidently poor. How poor? Let’s dig a little deeper into the plaintiffs’ claims. If you remember, they argued that Linkedin used an outdated hashing function, SHA-1, and that it didn’t use salting at all. So – what is hashing exactly, and why adding salt is so important?

Hashing

A Hash is a sequence of seemingly random characters. I use the word ‘seemingly’ because the hash is anything but random: it is the output of a special type of function called a Hash Function. A hash function takes a string of arbitrarily long plaintext characters as its input, and outputs a string of different characters of fixed length – often 256 bits or 512 bits long. For example – say your password is a totally fictitious phrase like… I don’t know… natenelsonishandsome. If you plug that into a Hash Function, it’s output will probably look something like this: 

6723bda23e999e0ca9

It is this random-looking string that is kept on the server, instead of the original password. It’s also important to note that the same input string will always result in the same hash being generated by the hash function. 

You might be asking yourself – what’s the difference between a hash and good old encryption? I mean, both take a string as the input, and produce a convoluted string of characters at the output. 

The answer is that while encryption is bi-directional in nature, a hash is unidirectional. Meaning, in encryption, if you have the output string and the encryption key – you can reverse the encryption and get the original text back: after all, that’s why we use encryption for messaging. But a hash cannot be so easily reversed: that is, even if a hacker has the hash, she can’t reverse the hashing process to get the plaintext input. 

And that’s what makes hashing so useful for storing passwords. When a user inputs their password in the login form, it’s easy to take that input, run it through the hash function and compare the output with the previously stored hash: since a hash function will always produce the same hash from the same input string, if both hashes are identical – we can be certain that the user entered the correct password. But even if the server is breached and an attacker gets hold of all the passwords stored on it – they still can’t tell what the original, plaintext passwords, were. 

Hash Tables & Rainbow Tables

Without going into the mathematics behind the hashing process – there is one simple and important fact about it that makes it vulnerable. While reversing the hash algorithm to get a plaintext password from a stored hash is extremely computationally intensive, there are simpler tools – kind of cheats, if you like – that can allow an attacker to turn hashes back into their respective passwords. 

Two such tools are Hash Tables and Rainbow Tables. A Hash Table is a pre-computed lookup table with plaintext strings in one column – and the corresponding hash values in the other. There are many such hash tables already floating around on the internet, populated with many well known and popular passwords, and their corresponding hashes for various hash functions. An attacker can compare the hash value they have in their possession with the pre-computed hashes in the table – and if they find a match, they now have the corresponding plaintext password. 

Naturally, Hash Tables can get extremely large: tens or even hundreds of gigabytes of information. Rainbow Tables solve this problem: without digging into the technical details, Rainbow Tables are similar in nature to hash tables, but are somewhat smaller and more efficient, trading the storage requirements for compute power. 

Hash and Rainbow tables, then, allow attackers to easily crack some hashed passwords – especially weak or common passwords. And here, finally, is where salting is needed.

Salting

A Salt is a value that is appended to the plaintext string, so that when the combined string is run through the hash function – it produces a completely different hash value, thus rendering hash and rainbow tables ineffective. 

Here’s an example. Say my password is 12345678: a weak and, sadly, an extremely common password. Since this password is so common, it’s highly likely that any rainbow table I download from the internet will contain its pre-computed hash – which means that if the server is breached and the bad guys get a hold of its stored hashes, I’m basically screwed. 

But say that the server, before computing the hash for my weak password, appends a salt – a random string of characters – to the password. For example, say the salt is AT3UT, then the combined string of characters will be 12345678AT3UT. When this new string is then processed by the hash function, we get a completely new hash value – one very different from the expected hash value of the original weak password. If the attacker then gets hold of this new hash and compares it to the contents of his rainbow table – he’ll find no matches, and thus the original password, despite being weak and common, will still be practically uncrackable – all thanks to the appended salt. 

And it is this practice of ‘Salting’ passwords before hashing them that was probably missing from LinkedIn’s password storing mechanism. LinkedIn never admitted to this publicly – but according to a Reuters report, several security experts who examined the stolen passwords were pretty confident that this indeed was the case. 

The End of the Trial

So, even though there weren’t any real standards at the time – Linkedin was still clearly awful at protecting its users. But the judge wasn’t impressed. He pretty much refused to address the password hashing and salting issue AT ALL – and in 2013 he dismissed the claims altogether.

“Any alleged promise LinkedIn made to paying premium account holders regarding security protocols was also made to non-paying members. Thus, when a member purchases a premium account upgrade, the bargain is not for a particular level of security, but actually for the advanced networking tools”.

In other words: you didn’t pay for security protocols. You paid for professional networking tools. Simple as that. The plaintiffs didn’t give up, and rightly so. They appealed – with some minor success. In March 2015 LinkedIn agreed to pay $1.25 million to settle the case – distributing the money between 800,000 Americans, who, allegedly, were damaged by the breach. But if you count all administration costs and lawyers’ fees, taken away from those 1.25$ million, then each compensated user got something like… $1. Hurray.

Well, if not for the huge compensation, Linkedin users could rejoice at least in the arrest of Yevgeni Nikulin – the hacker who got the passwords to begin with. On October 5th, 2016, this 29-year-old Russian citizen was caught at a restaurant in Prague, following a trip to Eastern Europe with his girlfriend. He was later extradited to the USA, and sentenced there to 88 months in prison. 

Modern Standards for Keeping Passwords

In 2017, NIST published – for the first time – a detailed set of recommendations concerning passwords. 

First, as opposed to their initial, and very general, document from 2004, that dealt only with a few theoretical aspects of using passwords online, the 2017 recommendations go into detail about best practices concerning passwords. And every couple of years, they update the list, adding new practices, and removing outdated ones. 

Interestingly, NIST guidelines don’t deal only with the backend aspect of password security, but also with how the notion of password security should be communicated to the users. Surprisingly, some of their new guidelines even go against what used to be common sense, in securing people’s passwords. For example, add a ‘show password’ option in the Log In form, so that users will not be forced to type their passwords blindly – and stop forcing your users to change their passwords every so often. Changing passwords frequently only makes people keep their old password, with a little dollar sign at the end, or some other minor change. Hackers are aware of it, of course, and so they can quickly crack the new password, in case they know the older one. Unless there’s a reason to change it, NIST doesn’t recommend any kind of password expiration.

As for hashing, NIST is very clear about SHA-1 being too weak. In fact, in 2015, even before the full 2017 initial guidelines were published, NIST specifically referred to SHA-1, on their official website:

“Federal agencies should stop using SHA-1 for generating digital signatures, generating timestamps and for other applications that require collision resistance.”

Salting is easier. NIST simply recommends doing it. And if the hashing algorithm is not as weak as SHA-1, salting can definitely serve as another layer of protection. 

But nothing in security is that easy. Let’s assume, for a moment, that LinkedIn did implement salting before the 2012 breach. Would that be enough to protect its users’ passwords after they were exfiltrated by the attacker? 

On june 11th, 2012, shortly after the original breach, Thomas H. Ptacek, a security researcher with Matasano Security, was interviewed on the krebsonsecurity.com website. He was asked about the Linkedin event, and whether salting would actually force attackers to expend more resources to crack the password hashes. 

Surprisingly, Thomas’s answer was:

“That’s actually another misconception, the idea that the problem is that the passwords were unsalted. UNIX passwords, and they’ve been salted forever, since the 70s, and they have been cracked forever. The idea of a salt in your password is a 70s solution. Back in the 90s, when people broke into UNIX servers, they would steal the shadow password file and would crack that. Invariably when you lost the server, you lost the passwords on that server.”

In simple words, Thomas is saying that salting is almost worthless – if the salts themselves are kept on the same server as the passwords file, and are both stolen in the breach. 

Ptacek is not alone. Professor Bill Buchanan, a Scottish computer scientist and cryptography researcher, wrote about the 2012 Linkedin breach on his blog at medium.com. He too believes that salting was a very minor issue in the LinkedIn case:

“The shocking truth is that the passwords themselves would have been cracked even with salting. If the salt is kept with the hashed password, as most systems do, the weak passwords which come from dictionaries would have been easily cracked. Many organizations think they are safe as they use salted passwords, but they are not if they use weak passwords. We can add salt, but if the intruder gets the salt, they can try popular passwords.”

The bottom line is that salting is effective – but only if done correctly. Unfortunately, there are quite a few ways salting can be done poorly. For instance: reusing salt, that is –  hard coding it only once into the program, or randomly generating it once. Using salts that are too short is pointless too, as the hacker can build a lookup table for every such salt. 

Could the attacker crack the hashed passwords had LinkedIn use salting when storing them? We will never know the answer to that question for certain – but the lesson here is that we – the users – simply cannot trust even the most respectable companies and organizations to take good care of our most sensitive secrets. Simply ticking the box that says ‘Salting’ isn’t enough. Like most things regarding cryptography and security, it has to be done properly – else it wouldn’t be strong enough to resist cracking attempts. 

Troy Hunt, founder of haveibeenpwned.com, said that much in an interview for our podcast in 2020: 

“[Troy] In terms of the organization itself and their responsibility, well look very often they’re just simply not taking this seriously. I mean I am amazed at how many instances of MD5 passwords we see even today.

Just yesterday, I loaded a data breach from a vBulletin forum, which inevitably someone just hadn’t updated or patched for years and years and years, which is still MD5. So I have a sense that organizations don’t understand that… Let’s say in the case of vBulletin or any other software package that people get off the shelf and run, software is a living, breathing thing for all intents and purposes. So that needs watering and feeding and caring that needs to evolve.

I find even in cases where I speak to people in organizations, I had another one from a breach just last month where I was speaking to someone in the organization involved and I said, “Look, are you aware your passwords are MD5?” And in their mind they said, “Well, that’s only going to be a problem if someone’s using a password that’s in a dictionary.” I was like, “Well, first of all you can calculate something like 20 billion MD5 hashes a second. So no that’s not right and second of all, just about every password’s in a dictionary these days.” So I don’t think that there’s enough understanding within organizations about what the actual risks are.”

LinkedIn’s poor track record in that regard – using an outdated hash function, SHA-1, without even salting the original passwords – makes it highly likely that even if Linkedin had used salting, they probably wouldn’t have taken great care to implement it correctly. And LinkedIn isn’t alone: Google, Facebook and Instagram all admitted in 2019 to storing millions of passwords in cleartext – and in Google’s case, in particular, this practice went on for no less than 15 years…

More Passwords Surface

The LinkedIn saga didn’t end in 2012. 

On May 18, 2016, four full years after the original event – a hacker who goes by the name “peace” told Motherboard website that he was trying to sell a list of passwords and account details of LinkedIn users, from the same 2012 breach. This time, the list contained no less than 167 million passwords. 

It’s a slightly surprising number, as it clearly exceeds the total number of Linkedin users in 2012. Perhaps it was that big because it contained all sorts of other sensitive hacked information, not necessarily just from the Linkedin breach.

But in any case, the relevant number was 117 million. That’s the number of encrypted passwords that seemed to belong to LinkedIn users, at the time of the 2012 hack. And that’s obviously a much larger number, than the previous one – 6.5 million. It’s almost 20 times bigger.  Motherboard cites one of its sources on Leakedsource, who said:

“It is only coming to the surface now. People may not have taken it very seriously back then as it was not spread. To my knowledge the database was kept within a small group of Russians.”

This time around, the ramifications were worse. Some high profile users got hurt, directly. One of them was Mark Zukerberg, CEO and Co-Founder of Facebook. In June 2016 Zuckerberg’s Twitter, Pinterest, and Linkedin accounts were briefly hacked and defaced by a group that identified itself as OurMine Team. 

True, it is difficult to verify that Zuckerberg’s hack was directly related to the Linkedin breach, but it seems pretty clear if you consider the timing, only a few weeks after the news about the bigger list was published, and especially if you consider the message the hackers tweeted from Zuckerberg’s own Twitter account: 

“Hey @finkd [Zuckerberg’s Twitter account name], you were in [the] Linkedin Database.”

If Mark Zuckerberg’s hack was actually related to the Linkedin breach, then it seems that he was reusing the same passwords on several websites. A big no no. 

Epilogue

The story of the LinkedIn hack demonstrates the evolving nature of cybersecurity standards, and why organizations need to stay vigilant and constantly improve their systems to follow the updated recommendations.

But it also teaches that we, the users, should not trust these organizations with our passwords blindly. Even large and “respectable” companies can sometimes drop the ball when it comes to keeping our secrets safe. That is why we should be proactive when it comes to our own safety: enable Two Factor Authentication, avoid reusing the same passwords on different websites – and never ever use the password “password”. No courtroom in the world would save you, if you do.