Komodor unveils Klaudia AI extensibility framework to power multi-agent incident resolution

Komodor has unveiled a new extensibility framework that transforms its Klaudia AI technology into a universal multi-agent platform for troubleshooting and optimizing the performance of complex cloud native infrastructures and applications.

This new architecture enables organizations to extend Klaudia AI with their own tools, services and agents, and combine these with more than 50 specialized agents already provided by Komodor. These new multi-agent orchestration capabilities enable teams to automate investigation and remediation of operational issues across all infrastructure layers including Kubernetes, GPUs, networking, and storage.

Klaudia AI extensibility framework

The announcement marks the next step in Komodor’s evolution from automated troubleshooting to a fully extensible autonomous AI SRE platform.

In cloud-native environments, issues that appear to originate in application workloads often stem from interconnected infrastructure components across Kubernetes clusters, networking systems, GPUs, and external services. Resolving these typically requires multiple engineers examining different layers of the stack at the same time. Komodor’s new orchestration architecture mirrors this collaborative model by coordinating specialized AI agents that work in parallel, so multi-domain incidents can be investigated continuously at machine speed.

“Most AI tools for operations focus on summarizing telemetry rather than resolving incidents, but complex outages require specialists from multiple domains working together to understand what’s happening across the stack,” said Itiel Shwartz, CTO of Komodor. “The Komodor platform’s new extensible architecture replicates this collaborative process using specialized agents that encode operational knowledge and work together to diagnose and resolve issues.”

Extending Klaudia AI with specialized agents

The Komodor platform introduces a modular architecture that orchestrates multiple AI agents, each responsible for a specific operational role. Workflow agents coordinate key reliability engineering processes such as detection, investigation, and remediation. They can also dynamically invoke specialized Subject Matter Expert Agents (SMEs) that bring deep expertise in specific technologies or domains such as Kubernetes, AWS services, GPUs, or deployment tools.

This architecture allows Klaudia AI to retrieve precise context exactly when it is needed, avoiding the hallucinations and data overload that often limit general-purpose AI assistants. Using this extensible architecture, Komodor has already developed more than 50 specialized agents across operational domains, enabling the platform to troubleshoot issues that extend far beyond Kubernetes clusters and into the broader cloud-native infrastructure stack.

Built to support existing IT stacks

Komodor’s extensibility framework enables organizations to bring their own services, tools and agents via MCP or an OpenAPI specification. Klaudia AI orchestrates these alongside its native specialists as part of the same investigation workflow to gain a better understanding of the issue and run remediation plans.

Early adopters are already using the framework to extend Klaudia AI with custom agents tailored to their environments. Examples include agents that:

  • Cross-reference CI/CD pipelines to correlate failures with recent code or configuration changes across microservices
  • Integrate with database management tools to determine whether application latency traces back to query performance or connection pool exhaustion
  • Query past incident channels to surface how similar symptoms were resolved in previous outages
More about

Don't miss