A compendium of container escapes

In this Help Net Security podcast recorded at Black Hat USA 2019, Brandon Edwards, Chief Scientist at Capsule8, talks about about a compendium of container escapes, and the RunC vulnerability in particular.

compendium container escapes

Here’s a transcript of the podcast for your convenience.

My name is Brandon Edwards, I’m Chief Scientist at Capsule8. Today we’ll be talking about a compendium of container escapes in the podcast. We’ve previously talked about escaping containers and the sorts of vulnerabilities people should be concerned with a while back. In particular we’re discussing how the RunC vulnerability had engendered all this interest, or concern, or almost shock, the trust the people are placing in containers was broken. Oh wow, an escape could happen!

I think it’s really valuable to be able to communicate and show all the other ways that that sort of thing can happen, either from misconfiguration, or over granting privileges, or providing host mounts into the container, or having kernel vulnerabilities that could somehow compromise any of the elements of the security model of container, which is both fragile and complex.

Everything from capabilities to file system access, to Linux security modules being enabled or not. The one common thing that we’ve observed at Capsule8 is that the container model changes between Linux distributions because on one Linux distribution you’ll have AppArmor and another will have SELinux and how those are enforced, in order to create the security framing for a container is different and as such, different capabilities, or different privileges, or different deployments, will break a container in one instance where maybe not in another.

I think it’s good to explore these things so that people understand what it is that makes a container and isolates it truly from other processes, and so they know really whether or not they should trust certain environments or how they should gauge the risk around those sorts of deployments.

One thing I’ve found particularly interesting was recently, Felix Wilhelm tweeted a container escape using cgroups release agent. What was interesting about it was not really the mechanism that was used but that there are these mechanisms that exist where if you have certain privileges you can get out. Part of what we want to elaborate on are how these mechanisms, it’s this not a one-off thing, it’s sort of like a theme. With Linux in particular, you have these user mode callback helpers where if you have privileges you can have them execute. The kernel will call back and execute user-controlled paths, or programs, where if you can specify that from within a container that gets you out of it.

I guess we found it interesting because the tweet also created a lot of interest. Several companies started blogging about it, talking about it, there was a bunch of chatter on Twitter, and it further emphasized the fact that container security’s misunderstood. Of course, this is possible given the scenario. So, I think it’s valuable for people to explore the inner workings of Linux, understand that containers are really just tasks, they’re not VMs. They don’t have their own kernel, they don’t have their own drivers, they don’t have their own environment, really in many ways, it’s all simulated. So, the perception of security around that is often misguided.

Let’s talk about some examples of where people can make mistakes, or what they should be doing to ensure that their expectations are being met from a security stance.

Unfortunately, one problem is that whenever you use orchestrators or any level of abstraction, you are almost certainly removing some of the security benefits because that’s removed from you, your access, which is fine. It’s great, it makes it easier to deploy containers at scale. But also, those abstractions, to make it easy, sometimes have to disable security mechanisms. For example, Kubernetes by default disables the seccomps, Secure Computing policies, which restrict the system calls that you can make. That mechanism is something that reduces the attack surface for a bunch of things basically within the kernel.

We have an exploit that we demo that wouldn’t work if you were to run it on Docker that you manually run, but works great when you run it with Kubernetes because of course Kubernetes doesn’t enforce seccomps. You can trigger all these things you shouldn’t be able to.

Starting from that, it’s important to then say “OK well, what are the things that I care about? What sort of mechanism should I be looking at and am I worried about that sort of attack surface? And if so, what should I be looking at with my orchestrator? What does it enable or disable? What does it enforce or not? Does it behave the same when I run a container myself? Does it have the same restrictions or privileges as when I deploy it with my orchestrator?” And then also tracking the differences between orchestrators because the other problem is, depending on what environment you’re deploying a container, if it’s in the cloud, if it’s in your data center, who set up the orchestration or what container engine you’re running, just understanding those differences.

Part of what we’ve found interesting is that it’s such a diverse ecosystem that it’s hard to have a grasp on what expectations should be. If you’re on CRI-O, if you have Rocket, if it’s Docker, or using Kubernetes. Where do these things change with these different levels of abstraction? It’s easy to get into a false sense of security when you run a container in isolation, on your laptop, by yourself. It’s like “OK this looks fine. It doesn’t seem like the container can do these actions or make these behaviors”. But the second you then deploy an orchestration it can. The irony in that is that containers are supposed to eliminate the ambiguity or incongruence between environment. The whole idea is you package software once and it acts the same way everywhere you deploy it, yet the orchestrators remove that.

I think it’s important to measure those sorts of things. Maybe it would be good if there was tooling around that even. There’s tooling on Linux and Windows, to say like “does this binary have ASLR, PIE, is it NX, does it have Stack Canaries?” We don’t have that equivalent for containers where we could say “run a container and tell me what’s wrong”. Maybe we should build that actually. That’d be a good idea for something to release, so that you could measure the different environments.

But generally, I would say if people wanted to assess in addition to the difference between their orchestration and their run times, what their security profile is, they should be looking at – do they have volume mounts into the container and what are they? That’s a direct bridge out of the container. The second you’re given filesystem access into the container, it’s a means by which they can mess with stuff that’s not just in the container. Are they granting additional capabilities or privileges, are they exposing their orchestrators, or are they exposing the Docker socket? All of these things are things to think about and look at.

And then kernel hygiene, like tracking kernel vulnerabilities, applying patches when vendors have them out. I don’t necessarily think it’s wise to say “build your own kernel” but at least be aware of your own kernel enough to where you can upgrade it or change it as necessary because the distributions are often very slow at getting patches ported in.

Some distributions are a little too fast as well in taking on new kernel code so it can work against you as well if you’re on a distribution that is, on the bleeding edge, the latest 5.0 series right now, or whatever the case may be. Because now you’re adapting or rather adopting new attack surface before we’ve even vetted or thought about whether or not these new mechanisms are worth having.

We see this with the user fault fds call, which has only been around a couple years and has had number of vulnerabilities associated with it. It’s also been used to help exploit a number of other vulnerabilities. It’s an important game of having an understanding of both what is stable and acceptable from your risk profile, but also making sure that you are, as far as new capabilities and new features go, as well as making sure that you’re keeping up when things are discovered and fixed. Because while RunC vulnerabilities could happen, any kernel vulnerability is a portal gun that will break you out of containers.

We do a lot of interesting monitoring of Linux behaviors, both from the user behaviors of like what are users doing on a system, how they interact with the system. But I think the real value that we bring for this sort of problem is that we monitor the inner workings of a kernel to where we can observe if a process that was in container is now operating on resources that are outside of a container and we can raise alert to that.

We can also detect kernel exploitation in general, which could be used to break out of the container and I think that sort of visibility is required for anyone that wants to have a good sense of like “is my container environment operating as it should be? Has it been compromised? What is the depth of that compromise? Is it just within a container or have they managed to get out to other containers to the host?”

I guess that’s important even more so than kernel hygiene because you should assume always that there is a vulnerability that will be exploited. It’s important to be able to observe it. The assumption like “we can stop hacking” is not valid. The ability to observe, and then your time to response, and how you investigate, and then how you can scope what the impact was of exploitation, those are the things people should be thinking about. We know that everything can be hacked. It’s a matter of resources and time.

So, what is your visibility? I guess the analogy that I heard other people using is (I definitely did not come up with it, I think maybe Dino or actually Zane Lackey recently tweeted on this) “you don’t need better locks or seals around on your windows, you need better cameras watching them”. Someone will pick the lock, someone will break the window, but do you know about it and what was the impact of that?

I think at Capsule8 that’s really the value we bring is visibility into these sorts of behaviors just out of the box. We can say “yep, this process wasn’t a container. Now it’s tampering with stuff on your host file system”, or it’s calling kernel functions related to changing its name spaces, or any number of these things that you should never see happening, but now you can see them happening and so you know something is nefarious.

For more information I recommend people visit us at capsule8.com to read more about the sort of container security and general Linux security for the sort of monitoring and detecting these attacks.