There Will Be Bugs- Issue #2

Take Note

Imagine this. You’re the world’s leading chip manufacturer, and you’re running into the limits of physics trying to make smaller, faster chips. Moore’s Law, named after one of your founders, says you’ll double performance every 18 months, and your 20-year streak is in serious jeopardy. Your engineers come up with a clever idea – let’s call it speculative execution – that makes the same chip run significantly faster, no strings attached. Moore’s Law lives on for another 20 years.

Fast forward to today. Researchers have discovered bugs in speculative execution that allow attackers to bypass nearly all security without detection, because they are bugs by design — features, really.  Fixing the bugs slows performance by as much as 30 percent; requires a complex sequence of patching hardware, operating systems and applications; and stresses IT teams to their breaking point. It is the nightmare we in the IT profession have feared since shortly after we decided a career in IT was a good idea and shortly before we realized it wasn’t a good idea at all. The fallout has been full of stress and intrigue: pulled-then-released-then-pulled-again patches, late nights of furious action, allegations of insider trading, and even angry letters from Congress.


You’d think the company responsible for this would be facing enormous public backlash, class-action lawsuits and declining sales. After all, Apple is facing all that after slowing down their older-generation phones in an effort to keep them compatible with their newest iOS release. But, we live in a world where unintended bugs are seemingly more forgivable than deliberate action, regardless of impact. Companies are eager to get their hands on more processors because the ones they bought last year just got a lot slower. One of the biggest mistakes in the history of computing looks to be quite profitable for the company responsible for it.

This month, we focus on the story behind these bugs and explore the bigger picture of what it all means.

– Cris


There Will Be Bugs

The Problem

In a nutshell, the problem is that any running app can read memory anywhere in the system, regardless of who actually owns it. And that’s a big, big problem. The bugs, named Meltdown and Spectre, are actually three different design flaws (Spectre is the name of two of them, we believe due to a shortage of “cool” names for devastating security bugs). We won’t dig too much into the details of the bugs, but you can find an excellent write up here if you’re interested.

Speculative execution is a fancy name describing how processors guess which commands to execute next instead of waiting for the apps to tell them. It’s a lot like card counting in Blackjack — if you’re good at it, you win more than you lose. In order to run those commands, the processor grabs the data it needs, but it keeps a copy of the original data just in case it guesses wrong. This dance of backing up, copying and deciding the outcome is where the problem arises. Nefarious apps can watch the flow of data and learn exactly what’s going in and out. It’s a complicated process, and the timing has to be precise. For this reason, most thought that an attack was impossible in spite of it first being theorized more than five years ago. It turns out the attacks are not that hard to pull off, though. You’ll see some videos in this issue that show just how simple it is for someone to spy on you.

The Fallout

The implications of Meltdown and Spectre are enormous. Perhaps the biggest remaining issue is with web browsers, even if you have applied every patch available. Without the use of Java or Flash, a malicious site can effortlessly read your cookies from other tabs and, say, grab your login information for your bank. Browser vendors have been working to mitigate the impact of this, but their work couldn’t even fully begin until after Intel, Microsoft and Apple figured out how they were going to fix the issue. There’s a long road ahead to address every way these flaws can be leveraged.

The real nightmare was in the cloud, though. Many irrationally view the cloud as a utopia that solves all problems. You trust super smart vendors who use super secure data centers to take better care of your data than you could on your own. That’s the idea, anyway. It turns out the cloud is just someone else’s server, and you’re sharing that server with all of their other customers. As a result, it all relies on systems being able to isolate one customer’s data from the other — something everyone believed was a long-solved problem. When you lose that protection, the premise of the cloud falls apart. None of this works anymore.

The Future

Speculative execution is too important to completely abandon, but the current fixes are not adequate. Browsers remain vulnerable to Spectre, and the cloud is still an important part of the future technology landscape. A new category of attack has been invented, turning designs that are literally hardwired into exploitable flaws that cannot easily be corrected. The less obvious dark side of this is the likely end to responsible disclosure, the principle that vulnerabilities are not disclosed to the public until a fix is available. As you’ll learn, Intel banded together with six of the largest tech companies to secretly work on fixes for more than six months, excluding many of their smaller competitors and even the U.S. government. This created big problems without clear solutions.

Expect more design flaws, difficult nights for IT teams, a long debate over the future of responsible disclosure and even some congressional hearings. The most important thing is to constantly remind yourself how little privacy you have while browsing the internet. There will be bugs. Act accordingly.


Intel Will Make How Much Off This Mess?

Nearly every server and laptop, along with quite a few phones, tablets and even appliances, sold in the past 20 years is using an affected processor. In the seven months between when Intel was informed of the issue (June 1, 2017) and when the public learned (Jan. 3, 2018), Intel continued to make and sell products that it knew contained significant defects. The company also knew the ultimate fixes would severely impact performance. In other words, Intel had sold–and then continued to sell–products it knew would not live up to spec.

Intel and other vendors like AMD and ARM plan to release products that are not affected by the security or performance issues later this year. This gives customers an awkward choice: Stick with your current, less secure and slower processor or update to a new, faster and more secure product, potentially years ahead of schedule. For many, this is a matter of personal and national security, and not just in the United States. That means it isn’t actually a choice, it is a cost — and a considerable one at that. Intel’s processors are the most expensive component of most servers and devices.

According to Digital Trends, “Intel stands to make potentially billions of dollars off of a problem it helped create.” Perhaps Intel CEO Brian Kranzanich should have thought more about the potential windfall before he sold $50 million worth of his stock at the end of 2017. Of course, Intel may need to pay out big given the ever-growing class-action lawsuits. But Wall Street seems to think Intel will do just fine — its stock is up 18 percent since the news broke, in spite of the overall market being down 10 percent.


Scary Real World Things Meltdown Can Do

Watch these short videos to see Meltdown in action.


There Will Be Patches. Lots of Patches

Meltdown and Spectre set off a firestorm of patching, particularly afternews was leaked prematurely. The severity of the vulnerabilities meant that patches had to be rolled out without hesitation, leaving IT teams without the time or ability to fully test them. In fact, even the vendors had not finished their testing. This reality became abundantly clear to everyone involved within days. In the three months since Jan. 3, Intel released, pulled and re-released updates many times, and each time they did, downstream vendors were forced to rework their patches.

At this point, Intel finally has released a number of stable updates, though more are on the way. In a statement, Intel’s CEO said, “We have now released microcode updates for 100 percent of Intel products launched in the past five years …” Intel published a guidance document indicating it is working its way backward toward processors that were sold up to 10 years ago. Beyond that, you’re on your own. And of course, Intel patches are a starting point. Once the microcode update is out, vendors such as VMware and Microsoft need to release their own patches to support those patches. Even applications like Firefox and Chrome need to make patches to support the patches for the patches.

This sounds less than ideal already, but the reality has been even more complex. Geoff Edelman, Rhythmic’s senior network administrator, refers to the fallout as “Patching Sadness.” He explains,

Speculative execution issues were ‘solved’ just like most vulnerabilities, with patching. I use the word ‘solved’ lightly because it took nearly three months to get patches that correctly solved the underlying problem and were truly stable.

Early attempts at issuing patches were fraught with problems, and many vendors pulled their initial patches after they were released. This put an enormous load on IT teams everywhere. The only thing worse than emergency patching is having to then immediately undo your emergency patching because that typically creates more problems than it solves. Unlike most vulnerabilities that are discovered Spectre/Meltdown was a hardware-level issue. It was fundamental to the operation of computer systems. This meant that simply patching Windows or Linux was not sufficient to correct the problem. In the majority of cases, the BIOS (computer firmware) of the affected server or workstation would need to be updated, and if the system were running a virtualization solution such as VMware, the hypervisor would need to be updated as well. Virtualization technology being the core of many cloud providers’ businesses meant their teams were forced to apply these fixes in sequence, usually requiring each server to be rebooted at least twice and often three times to apply patches at every level. Even with automation, this meant significant downtime for each server and a huge investment of time and resources by providers that could have been used elsewhere. When initial patches were found to be so unstable as to cause unintended system reboots, everyone was scrambling once again. As more stable patches were released, the process needed to be repeated again.

Once the dust settled and stable patches were in place, processors performed considerably slower than originally designed. Speculative execution was critical to high-performance processors, and it was effectively eliminated overnight. The performance problem isn’t going away anytime soon and can’t be solved with another patch. The processors themselves must be redesigned, and even Intel’s coming ninth generation of processors will continue to be affected. Their 10th generation of processors, coming next year, will have a proper fix.

We continue to be amazed at the broad implications of this. Like Equifax, which will make billions selling credit-monitoring service to the people whose credit they compromised, Intel will make even more selling processors to companies that simply need to get back to the level of performance they used to have. And in spite of keeping it all a secret for six months, they still botched the fix multiple times.


Big 7 Defend Their Embargo to Congress

In June, Google’s Project Zero notified Intel that the company had a significant issue on its hands, but the rest of us didn’t know until January because key players in the industry agreed to an embargo. Eventually, the circle of people who knew grew big enough, and the news leaked about a week before Intel had planned to disclose the issue. This led to a scramble of cloud providers mass rebooting servers on customers to complete their patches before the attacks became widely known. And it led to an even bigger scramble amongst the vast majority of cloud providers who were not part of Intel’s secret club.

Needless to say, an embargo like this raises questions. Responsible disclosure is a standard practice in security — things don’t work well if every vulnerability lands on Twitter without the vendor having the opportunity to fix it first. Researchers respect the system because they get the credit for their work without triggering unpopular panics. However, the scope of these issues went far beyond any past example of responsible disclosure. While a handful of vendors were notified, many of their competitors — equally impacted — were left in the dark. Somehow, nobody managed to inform the U.S. government, even though it appears China perhaps was notified in advance. Intel and others already have defended themselves to Congress, but the end may not be in sight.

In summary, a small group of the largest tech companies made the collective decision to keep Meltdown and Spectre a secret from their customers, competitors and the U.S. government. Not surprisingly, this resulted in letters from Congress, along with the start of what promises to be ongoing congressional testimony and hearings.

Intel invoked the long-standing principle of responsible disclosure, surprising no one. However, their defense has some people raising eyebrows, wondering why Intel didn’t feel any U.S. government organizations, such as the United States Computer Emergency Readiness Team (US-CERT) or the Department of Homeland Security (DHS), could provide assistance. In fact, it is common in the case of responsible disclosure to provide US-CERT with high-level information to ensure an orderly response once fixes are available. Instead, Intel stated, “before the leak, Intel disclosed information about Spectre and Meltdown only to companies who could assist Intel in enhancing the security of technology users.” Intel added that it planned to notify the government ahead of the public announcement and “expedited its plans … and promptly briefed governments and others …” after the story leaked on Jan. 3.

The other companies either lined up or hid behind Intel. Amazon’s response boiled down to “Intel required all information it provided … be subject to a non-disclosure agreement.” Apple said ARM — not Intel — told Apple of the flaws and that Apple had no role in the creation of the embargo, though they agreed with it in principle. Google and Microsoft were particularly bold, pointing out that the companies were following US-CERT best practices for responsible disclosure. Of course, they left out the part about not including US-CERT.

This was an unprecedented situation that affected nearly every vendor of technology, and these seven companies made a collective decision to protect themselves first, leaving competitors and customers alike to do their best to take effective action after the news leaked. Their approach didn’t work — news still leaked before fully tested patches were ready, and chaos ensued. Needless to say, we think this was the wrong decision. You can read each vendor’s response below and decide for yourself.


In the News

© Copyright 2019 Rhythmic Technologies, Inc. All Rights Reserved.