Where security, DevOps, and data science finally meet on AI strategy

AI infrastructure is expensive, complex, and often caught between competing priorities. On one side, security teams want strong isolation and boundaries. On the other, engineers push for performance, density, and cost savings. With GPUs in short supply and budgets under pressure, the balance isn’t easy.

In this Help Net Security interview, Andrew Hillier, CTO at Densify, explores how organizations can approach Kubernetes optimization with security, observability, and strategic maturity in mind, and why thinking in terms of “yield” may be the key to sustainable AI operations.

Kubernetes AI optimization

How can security-driven considerations, such as workload isolation or compliance, conflict with purely performance-based Kubernetes optimizations, and how should teams balance them?

Security and performance optimization often pull companies in opposite directions. This is no different with Kubernetes resource optimization. When you aim to purely drive up utilization and density, you naturally push workloads closer together and maximize resource sharing. To maximize security, you want isolation, dedicated resources and boundaries between workloads.

The key is to define isolation requirements upfront and then optimize aggressively within those constraints. Make the business trade-offs explicit and measurable. When teams try to optimize first and secure second, they usually have to redo everything. However, when they establish their security boundaries, the optimization work becomes more focused and effective.

What role do observability and workload profiling play in tuning AI pipelines within Kubernetes, and how do they intersect with cost and security controls?

Observability becomes critical when dealing with AI workloads because the cost of getting it wrong is higher. With traditional applications, misconfiguring wastes budget, but with GPUs, all costs increase exponentially.

The intersection with cost controls is immediate. You need visibility into whether your GPU resources are being utilized or just sitting idle. We’ve seen companies waste a significant portion of their budget on GPUs because they’ve never been appropriately monitored or because they are only utilized for short bursts, which makes it complex to optimize. We view this as a yield problem, and optimizing the yield on expensive assets like GPUs is often the goal, but this starts with measurement and visibility.

Observability also helps you understand the difference between training workloads running on 100% utilization and inference workloads, where buffer capacity is needed for response times. Without visibility, teams either overprovision, causing massive waste, or underprovision inference workloads, causing a terrible user experience. Again, this is a GPU yield issue, where the maximum yield may be dictated by the use case, and the profiling data becomes part of the optimization strategy.

In some organizations, GPU hoarding is seen as a hedge against future AI needs. How does this behavior translate into wasted resources or even security risks?

The hoarding behavior is fascinating because it’s actually rational, given the supply constraints teams have faced. But it creates cascading problems beyond just the obvious.

From a resource perspective, we’ve seen companies hold onto much more GPU capacity than they actually need. Not because they’re using it, but because they’re afraid they won’t be able to get it when they do eventually need it, or that they won’t get it back if they let it go, even for a few hours. This hoarding is expensive and also strategically limiting. You’re tying up budget in idle resources instead of investing in new capabilities or expanding successful AI initiatives.

From a security perspective, the very reason teams can get away with hoarding is the reason there may be security concerns. AI initiatives are often extremely high priority, where the ends justify the means. This often makes cost control an afterthought, and the same dynamic can also cause other enterprise controls to be more lax as innovation and time to market dominate.

This can also lead to organizational risk: When you operate outside the established processes, you make capacity planning decisions in isolation rather than as part of a broader infrastructure strategy. This can lead to shadow IT problems and yet another layer of security concerns.

Companies must gain visibility into their actual capacity and use data to make more informed decisions about how much capacity they really need.

How does the move toward strategic maturity change the conversation between security teams, data scientists, and DevOps engineers?

As the use of AI becomes more mainstream, we’re seeing a shift from “AI at any cost” to “AI with accountability.” Data science teams were often given too much freedom early on, quickly spinning up massive GPU clusters without anyone raising questions. Now, platform engineering teams have been brought into the conversation to clarify why we’re spending millions on computing resources that might be sitting idle most of the time.

The conversation has become more collaborative out of necessity. Security teams understand the workload patterns, whether it’s a training job that should run at full capacity or inference workloads that require response time buffers. Data scientists have realized that they cannot request full GPUs when a quarter-GPU partition would suffice. Furthermore, DevOps teams have acknowledged the need for visibility into what is running on these costly resources, beyond just checking if they are operational.

The focus has moved from territorial disputes toward a shared responsibility for ensuring that AI infrastructure is financially sustainable.

Are there parallels between traditional manufacturing yield optimization and how we should approach AI infrastructure from both performance and security standpoints?

Absolutely. We use the term “yield” specifically when discussing GPU utilization because it accurately describes what’s happening. You want to maximize the output from these resources. Just like in manufacturing, you don’t want equipment sitting idle. However, the optimization strategy for AI workloads depends entirely on what you’re producing.

When training models, you can push GPUs to 100% utilization because it’s essentially a batch job. You can run them intensively until the training is complete. In contrast, for inference workloads that serve users, you need to maintain some buffer capacity. Running at 100% utilization means that response times for everyone slow down.

The manufacturing analogy also comes into play with respect to data prep. If your expensive machines sit idle because the raw materials needed as inputs aren’t available, then your yield goes down. The same is tru of AI training workloads. If the data prep phase causes GPUs to sit idle then the GPU yield is reduced. The resources aren’t used in isolation, and the entire system must be optimized.

And the security aspect comes into play when you start partitioning GPUs and time-slicing workloads. It’s critical to ensure isolation between different AI services while still maximizing overall yield. The goal is to find the right balance: not hoarding four times the capacity you need, but also not running so lean that performance suffers.

Don't miss