Deepak Goel, CTO, D2iQ

March 15, 2023

So, you want to deploy air-gapped Kubernetes, huh?

So, you want to deploy Kubernetes in an air-gapped environment, but after months of grueling work, you’re still not up and running. Or maybe you’re just embarking on the journey but have heard the horror stories of organizations trying to manage their Kubernetes deployment in air-gapped environments without success.

air-gapped Kubernetes

Whether you’re working in a mission-critical or life-critical environment, be it a hospital or the military, you likely have critical systems and sensitive data you want to protect from data theft or security breaches. Air-gapping is ideal for that purpose, but it’s also immensely tricky.

Why is air-gapped Kubernetes so hard?

One of the reasons Kubernetes deployments in such environments so often struggle or outright fail is because many organizations don’t properly plan in advance for what the architecture should look like.

Kubernetes was designed to be used online and, in that setting, there’s a wonderful ease of use that comes with sharing and delivering container images from publicly hosted container registries on the web. In fact, container configurations usually assume by default that a container is up for grabs on Docker registry.

This very design turns out to be an Achilles’ Heel for air-gapped environments, where you need to host, secure, and make available your own fault-tolerant container registry, through which you need to manage the whole cluster and application lifecycle. This isn’t easy and as a starting point, you need to have a comprehensive understanding of Kubernetes.

So let’s have a look at some of the biggest mistakes that people make, and the considerations you should be making when deploying Kubernetes in air-gapped environments.

Agility in transfer process

One of the first things to consider – and likely one of the biggest factors that will make or break your air-gapped deployment – is the agility in the transfer process of your artifacts.

Operating in an air-gapped environment does not automatically improve security. You still need to regularly update your apps and platform, but if you have an extremely long requirement gathering and transfer process, it makes the maintenance of the environment complex and serve as a deterrent for rolling out new features and security patches. Before long, that deterrent can lead to you falling behind the rest of the community, and suddenly you’re a lot more vulnerable than you ever intended to be.

What can you do about that? This is where migration clusters come in.

Migration cluster

One of the best things you can do to mitigate a long requirement gathering and transfer process is to have a migration cluster. In essence, this involves running a non-air-gapped cluster in parallel with the air-gapped one to assist it. In practical terms, these clusters are identical, except one is online and the other is not.

In the long run, the non-air-gapped cluster functions a bit like a mirror to the air-gapped one, and the big benefit is that you can remediate bugs, gather requirements, speed up delivery, help developers avoid missing things, and help meet security requirements. You can then even use it as a deployment without the need for an image repository.

As an example, a cruise liner might have things running on AWS but also out in the ocean. So they might actually have an internet connection in the ocean, but they don’t necessarily want that cluster to operate online because they want to reserve it for use by their customers. So, they’ll start with an internet-connected system, do all the testing and development in there, then package it up inside their air-gapped system.

These migration clusters are also handy for training. When dealing with air-gapped environments, usually you have security requirements, and getting personnel in can be tricky. You may recruit someone who isn’t allowed access until they have passed background checks, for example. However, you can give them access to the internet-connected cluster because it’s not holding access to any of your proprietary, secret, or customer data that you want to protect by using air-gapped.

That internet-connected cluster is also where you can do upgrade tests, so you don’t have to worry about it when you get to the air-gapped environment.

Image repository

So, what comes after the migration cluster? An image repository. While you can technically complete a deployment without one, you won’t get very far. Even at Day 0, it’s best to have one.

Why have an image repository? For one, you need it to host your own containerized applications in an air-gapped environment and second, it’s your first line of defense and great in terms of supply chain security. Using your own image repository means that you can provide an approved set of container images that are signed and pre-scanned for any vulnerabilities. All the image repository technologies – be it Harbor, Nexus, Artifactory – provide some form of scanning. This allows you to supply chain control all the images you’re bringing in, so no one brings in any secret unwanted images; you can even control who loads images into the cluster. It provides the level of security that you sought for when you initially started with an air-gapped environment.

Network boundaries

There’s also always some sort of network boundary you must consider. Most systems are connected to other systems – your Kubernetes cluster must get applications, configuration and data from somewhere. You must consider which network boundaries you’re going to set up for your clusters and how you’re going to expose your cluster to your larger IT system. It’s important to carefully design and architect your network to secure both the north-south traffic coming in your cluster and east-west traffic related to how your applications are connecting to each other within the cluster.

Now, given that most air-gapped systems are on-prem, there are other infrastructure-specific concerns you also must bear in mind that aren’t necessarily specific to air-gapped environments. For example, to avoid bottlenecking your performance, you want to think in advance about the hardware you’re going to have on-prem because it won’t be easy to roll out new hardware that meets your requirements quickly.

That’s a whole different beast, however, so for the purpose of this conversation, we won’t delve too deeply into that. But the takeaway is this: understand what infrastructure you already have and what you’re capable of reasonably expanding and growing in your air-gapped environment.

Documentation

Finally, this might seem mundane, but yes, you must also think about documentation in an air-gapped environment. Most of the documentation related to Kubernetes is online and you need a connected network to read them. You need to plan to print or download it before you start deploying cluster in an air-gapped environment.

Long story short, the key to not cornering yourself in an unsuccessful year-long attempt at deploying air-gapped Kubernetes is understanding the air-gapped system you’re trying to implement. You need to think about where you have access and where you don’t. What can you use? What can’t you use? And once you have all the infrastructure requirements and limitations in sight, you need to design the system with those things in mind.

More about