Edge computing’s biggest lie: “We’ll patch it later”
Edge computing is spreading fast, from factory floors to remote infrastructure. But many of these systems are hard to maintain once they are deployed. Devices may run old kernels, custom board support packages, or stacks that no one can rebuild years later. Updates can fail due to weak connectivity or power loss, and a mistake can brick thousands of systems at once. Add AI workloads that cannot tolerate downtime, and patching becomes even harder.
In this Help Net Security interview, Piotr Buliński, CTO of Qbee, digs into the edge equivalent of “snowflake servers,” why cloud habits break in the field, and what it takes to monitor and update fleets safely.

What is the edge equivalent of “snowflake servers” in the data center, and how does that problem scale into a fleet-wide risk?
In data centers, a ‘snowflake’ is a server whose unique configuration makes it impossible to reproduce. In the Edge and Embedded world, the equivalent of this is called a ‘Frozen Device.’ Such devices are run on a poorly maintained software stack – often a custom kernel or a proprietary Board Support Package (BSP) – that has become a black box over time.
The longer these devices are neglected, the worse the security risk. As a general rule, if you haven’t checked in on a device several years after deployment, you aren’t just managing hardware; you’re managing a liability.
Unlike in data centers, where on-site technical support (known as remote hands) can swap a drive or a serial console can rescue a bricked server, an edge device is often physically or geographically isolated and managed by non-technical personnel.
Pair this technical atrophy with a large number of unmaintained devices and risk can scale into a fleet-wide crisis. If the original build environment is gone, the toolchains are deprecated, or the security patches for that specific legacy kernel no longer exist, an entire fleet is a sitting duck for bad actors.
Under upcoming regulations like the EU’s Cyber Resilience Act (CRA), the inability to patch isn’t just a technical failure. It’s soon to become a compliance violation that carries significant financial and reputational penalties.
Companies can no longer optimize for ‘Day 1’ deployment while ignoring ‘Day 2’ lifecycle management. This shortsited approach to IoT is essentially betting that the threat landscape won’t evolve. And let me tell you…it always does.
What is the most common false assumption teams make when they apply cloud operating models to edge environments?
The most dangerous assumption teams make is that they can apply a “Cloud MVP” (minimum viable product), using a product that’s just good enough – with the intention of figuring out the operational details after launch. In the cloud, a failed deployment is an issue with a five-minute fix. In edge computing though, it can be a catastrophic, fleet-wide “bricking” event that makes a server, computer, or IoT device completely inoperable.
If you don’t architect for a production-grade foundation, specifically a robust OS and atomic (all-or-nothing) OTA update mechanism from Day 1, you are building on an insecure foundation. Unlike servers that are always reachable, edge devices may be powered off for months. If your update infrastructure can’t handle long-term dormancy or interrupted power, you’ll be stuck maintaining an inferior legacy code for years just to keep the lights on.
Another aspect to this is environmental volatility. While cloud models assume “infinite” resources such as stable power and reliable high-speed networking, at the edge, an update often fails due to a simple power outage or a flaky cellular link. Without a mechanism that can automatically revert to a known-working state, a failed update requires a physical site visit. The cost of sending an engineer into the field can instantly destroy the margins of a project.
Perhaps the most overlooked area is observability and post-launch blindness. In a data center, teams can instrument monitoring post-launch because they always have eyes on the server. At the edge, if you don’t build monitoring into the initial image, you lose all visibility. If a bug prevents an update from rolling out and you don’t have the telemetry to see why, teams then lose the ability to mitigate the risk. Losing the ability to manage your assets can be just as serious as losing your data.
Are you seeing cases where AI workloads at the edge force organizations to rethink patch windows entirely because they cannot afford downtime?
Absolutely. AI at the edge has fundamentally killed the traditional patch window. In a data center, a 10-minute reboot is an inconvenience; at the edge, if that AI is performing predictive maintenance on a high-speed production line or monitoring a power grid, 10 minutes of downtime creates a huge gap where teams are completely blind to potentially major failures. As a result, we are seeing organizations rethink their entire stack to achieve zero-downtime updates. This is driving two major shifts.
The first being decoupling the model from the OS. Instead of one monolithic firmware update, teams are moving toward containerized architectures. This allows them to ‘hot-swap’ AI models or security patches for the inference engine without rebooting the entire system or stopping the primary application.
The second is A/B redundancy as a standard. To avoid the risk of a botched update ‘bricking’ a critical AI node, we’re seeing the ‘Blue-Green’ deployment model move from the cloud to the edge. The device maintains two identical environments; the new model/patch is loaded onto the ‘passive’ side, verified, and then the system flips over in milliseconds.
If your edge AI solution requires you to stop the business to secure the device, you haven’t built a modern edge solution, you’ve created a bottleneck. The goal now is ‘in-flight patching’, the ability to evolve the system’s intelligence without ever losing sight of the physical assets it protects.
Many organizations want cloud-style CI/CD velocity at the edge. What part of the pipeline tends to break first when teams try to implement that?
In the cloud, Continuous Integration and Continuous Delivery (CI/CD) is a manageable software-to-software transaction in which environments are very predictable. At the edge however, CI/CD hits a wall because of the validation gap – where real-world device conditions can’t be reliably tested before deployment.
While building and shipping are largely solved problems, verifying that a build “works” before it hits the field is exponentially more complex. The pipeline typically breaks in two specific places:
The first of which is the flaky Hardware-in-the-Loop (HiL) trap. In order for teams to gain true confidence in their IoT deployments, they eventually have to move beyond software simulation and use actual hardware rigs for testing. This requires a significant infrastructure investment – often using tools like Labgrid (for automated hardware control) or custom test harnesses. The “break” happens when the test infrastructure itself becomes unreliable. If a cable is loose or a relay fails, teams get flaky tests. In a CI/CD world, flaky tests destroy developer trust and halt the pipeline.
Then we have the Security “Parity” Paradox. Teams often take shortcuts by disabling security features (like Secure Boot, disk encryption, or strict SELinux policies) in the test environment to make automation easier. This creates a dangerous ‘Parity Gap.’ It may sound obvious, but if you aren’t testing with the security features enabled, you aren’t testing the real thing. It is very common to see a build pass CI perfectly, only to fail in production because a new security policy blocked a critical process.
Ultimately, if your test environment doesn’t mirror the physical and security constraints of the field, your CI/CD pipeline is just a machine for shipping bugs faster. If you want predictable edge deployments, you have to treat your test hardware with the same rigor as your production hardware.
How do you build trust in fleet-wide monitoring when devices can be offline, partially compromised, or reporting inconsistent data?
Trust at the edge isn’t built on 100% uptime; it’s built on verifiable identity and contextual intelligence.
First, you must establish a cryptographically verifiable identity. Rather than just a serial number, you need a hardware-backed root of trust, using a Trusted Platform Module (TPM) or Trusted Execution Environment (TEE), to handle mutual Transport Layer Security (mTLS) certificates. This ensures that the data you’re seeing came from the device you think it did, preventing spoofed telemetry from compromised nodes.
We also have to treat inconsistency as a diagnostic signal. A trusted device can still report bad data due to sensor failure or local tampering and a mature monitoring system should automatically flag outliers by comparing a device’s behavior against its peer group or historical baseline. If one device in a fleet of 1,000 starts reporting impossible temperatures or strange access logs, that will be your starting point for a targeted investigation.
Finally, the definition of ‘offline’ is in need of an update. In an edge environment, connectivity is often intermittent by design. Trust comes from being able to differentiate between expected silence during sleep or known dead zones and anomalous silence (a device that stopped checking in right after a failed login attempt). By providing granular information like this, we ensure that an engineer is only dispatched when absolutely necessary. Moreover, when the engineer arrives on-site, they already have the diagnostic data they need to fix the problem on the first visit.