By: Adam Pan, JD ‘19

As part of this year’s Yale Cyber Leadership Forum, I gave a short demonstration of a vulnerability in the MD5 hashing algorithm that was exploited by the infamous Flame worm discovered in 2012. To add a bit of dramatic flair, I built the hack into what at first appeared to be a much more innocuous demonstration of the Elliptic Curve Digital Signature Algorithm (ECDSA), an industry-standard encryption suite, before revealing the actual hack that I intended to present. The preparation that went into the demo presented its own challenges and lessons, but it was the reactions to the demos that gave me the most food for thought. As I described the hack, I could see that many of the attendees already had some knowledge of the Flame attack. However, as expected, only a few of the attendees were familiar with the technical details of how Flame worked.

What was surprising to me was how few of the attendees seemed to appreciate why the Flame attack was so significant. The reason the attack worked was not because there was an undiscovered vulnerability in the MD5 algorithm. There is no denying that Flame was designed by a highly sophisticated group, but knowledge of MD5’s vulnerability to chosen-prefix attacks was known as early as 2007,[1] and the first practical attack—the creation of a rogue digital certificate—was demonstrated in 2008.[2] Furthermore, Microsoft’s security team was aware of the potential attack vector as early as December 30, 2008.[3] No, the reason that the attack worked was because Microsoft (1) underestimated the severity of this vulnerability and (2) did not have sufficient procedures to phase out obsolete cryptographic functions such as MD5. And Microsoft was certainly not an outlier, at least with respect to MD5. Even after the disclosure of the Flame cyberattack, a cybersecurity firm reported that after inspecting the certificates of 450 companies in the Global 2000, 17% of those certificates were still based on MD5.[4]

Yet when I spoke to the attendees after my presentation, the focus remained on the technical prowess that enabled these attacks—not on the policies that allowed these attacks to happen. That was surprising, considering we had just spent the bulk of the last two days focused on how to craft effective policies for preventing cyberattacks. The elephant in the room was that despite excellent research output by the cryptographic research community, companies have repeatedly failed to perform even basic diligence on publicly known and widely reported information, let alone confidential disclosures from white-hat hackers. For example, more than a year after Flame was reported on, Yahoo! became the subject of a massive data breach involving around every Yahoo! account that existed at the time.[5] One of the most damning revelations was that Yahoo! was still using MD5 to hash the passwords as late as 2013, long after the cryptography community had universally considered the algorithm insecure and obsolete.

But instead, the conversations of the day made it seem like the public sector was more interested in setting a ceiling, rather than a floor, for private sector encryption. One topic that was broached was the recent Australian data encryption law, which requires companies provide government and law enforcement authorities access to encrypted data and communications. Some argued with enough controls, the technical backdoors required by the Australian law would do little to weaken consumer privacy whilst being a boon for law enforcement officials, who have continuously been thwarted by end-to-end encryption schemes. Others were more skeptical about the value of this legislation. They reasoned that such laws are futile for two major reasons. First, the fundamental theories that enable encryption are impossible to regulate. Any sufficiently competent individual will be able to transmit encrypted messages to another regardless of whether a communications provider provides built-in encryption services, because encryption can always be layered on top of an insecure form of communication. Second, the types of technological measures required by the Australian law will inevitably have a negative impact on consumer privacy. Data breaches such as in the case of Yahoo! would have significantly more impact if a malicious actor was also able to access the “master keys” to decrypt all user communications on a platform. This discussion reminded me of a quote from 2002:

There are known knowns. There are things we know that we know. There are known unknowns. That is to say, there are things that we now know we don’t know. But there are also unknown unknowns. There are things we do not know we don’t know
— Donald Rumsfeld

The regulation of cybersecurity similarly deals with this spectrum of issues. The MD5 vulnerability might be a known unknown: such an attack was theoretically possible, but it was unknown if any threat actors were exploiting it. Microsoft was aware of it, despite not knowing exactly how an attacker might utilize it. But for many companies, even something as old-hat as the MD5 vulnerability might have been an unknown unknown. Recent data breaches have shown us that not every company is as diligent as Microsoft when it comes to addressing cyberattacks. As discussed above, Yahoo!’s use of MD5 bordered on negligence. How many of those Global 2000 companies would have switched away from MD5, had they had competent advice on the security risks? Why aren’t these companies incentivized to perform due diligence? Is there another factor that can explain why so many companies fail to incorporate even basic practices? There is clearly some duty to practice “good cybersecurity,” but how far does that duty extend? Taking that every company should address the known knowns, how should companies address known unknowns and unknown unknowns?

One way to organize these thoughts is to categorize cyber risks as those that can be addressed through prevention (known unknowns) and those that can only be cured ex post (unknown unknowns). In this framework, Microsoft’s MD5 vulnerability was a known unknown because there was a clear path for Microsoft to take to prevent exploitation: (1) track its own usage of each encryption algorithm, and (2) phase out its use of the insecure algorithm as soon as vulnerabilities are discovered.

As an aside, it has long been accepted in the cryptography community that every deterministic encryption algorithm will eventually be rendered insecure.[6] It naturally follows that a company using any cryptographic function needs to plan for obsolescence. The fact that Microsoft clearly failed to do so raises serious doubts as to whether other multinational technology companies can succeed where Microsoft’s security team—one of the most respected in the business—has failed.

The other category of cyberattacks would be those that arise from entirely new vectors that the community has not foreseen. The tenor of our second day’s conversation, especially the panel on organizing to address cyber threats, reflected a deep worry that there are many emerging forms of cyberattacks, from disinformation campaigns to IoT. In addition, the panelists from the private sector expressed confidence that while public-private partnerships were appropriate for emerging cyberthreats, private actors were best situated to deal with

I wonder if the minimum effort, maximum effect policy now might be to address the known unknowns—the vulnerabilities that have been proven to exist, but not yet proven to have been used. I think that the first step to bridging the gap is to take a hard look at the current Coordinated Vulnerability Disclosure (CVD) process, which was briefly mentioned during the day’s proceedings.[7] CVD provides one way for companies to engage in ex ante prophylaxis against known unknowns, but participation is voluntary and nonbinding. Of course, voluntary frameworks such as CVD cannot compel companies to act. However, perhaps standards of liability can be based on community norms established through a CVD-like process. For example, Google’s Project Zero and other CVD participants enforce a disclosure policy, where unpatched vulnerabilities are publicly disclosed after a certain amount of inaction by the responsible vendor. Instead of mandating public disclosure, we could condition liability based on these norms. Applying this to the MD5 example, Yahoo! would almost certainly face legal liability because it failed to adhere to a “reasonable” cybersecurity practice of not using wildly outdated encryption protocols to protect user data.

The data on data breaches continues to show that most major hacks arise more out of carelessness or ignorance of the victim companies than the sophistication of the attackers. As I concluded in my hacking demonstration, the security of any system is only as good as its weakest link. If the goal is to bridge the gap between not only the public and private sector, but individual entities in each sector, we should focus on addressing the weakest links in the digital industry and adopt a uniform policy for patching the known unknowns.


[1] Marc Stevens, Arjen Lenstra & Benne de Weger, Chosen-Prefix Collisions for MD5 and Colliding X.509 Certificates for Different Identities in Annual International Conference on the Theory and Applications of Cryptographic Techniques 1 (2007), https://link.springer.com/content/pdf/10.1007/978-3-540-72540-4_1.pdf.

[2] Alexander Sotirov et al., MD5 Considered Harmful Today in 25th Annual Chaos Communication Congress (December 30, 2008), https://www.win.tue.nl/hashclash/rogue-ca/.

[3] Microsoft Security Advisory 961509 (December 30, 2008), https://docs.microsoft.com/en-us/security-updates/securityadvisories/2008/961509.

[4] Richard Steinnon, Flame’s MD5 Collision is the Most Worrisome Security Discovery of 2012, Forbes (Jun. 14, 2012), https://www.forbes.com/sites/richardstiennon/2012/06/14/flames-md5-collision-is-the-most-worrisome-security-discovery-of-2012/#45c084d14943.

[5] Dan Goodin, Every Yahoo Account That Existed—All 3 Billion—Was Compromised in 2013 Hack, ArsTechnica (Oct. 3, 2017), https://arstechnica.com/information-technology/2017/10/yahoo-says-all-3-billion-accounts-were-compromised-in-2013-hack/; Darren Pauli, Security! Experts! Slam! Yahoo! Management! for! Using! Old! Crypto!, Register (Dec. 15, 2016), https://www.theregister.co.uk/2016/12/15/yahoos_password_hash/.

[6] See, e.g., Lamont Wood, The Clock is Ticking on Encryption, Computerworld (Dec. 17, 2010), https://www.computerworld.com/article/2511969/the-clock-is-ticking-on-encryption.html.

[7] See generally Allen D. Householder, Garrett Wasserman, Art Manion & Chris King, The CERT Guide to Coordinated Vulnerability Disclosure (Carnegie Mellon Univ., Software Engineering Inst., Special Report CMU/SEI-2017-SR-022, 2017), https://resources.sei.cmu.edu/asset_files/SpecialReport/2017_003_001_503340.pdf.