Elastic brings AI-driven incident investigation to Kubernetes and observability tools

Elastic has introduced an agentic Kubernetes investigation workflow and MCP-based observability skills that diagnose incidents the moment an alert fires. By the time an SRE opens the alert, the root cause has already been identified, evidence has been assembled, and recommended next steps have been surfaced.

Elastic Observability Kubernetes investigation

For teams running Kubernetes at scale, the gap between alert and answer costs time, compounds outages, and wears down on-call engineers. Elastic closes that gap by starting the investigation automatically, before anyone is paged.

Elastic Observability builds on Kubernetes dashboards, prebuilt alert templates, and ML-powered anomaly detection to deliver two ways to accelerate from alert to resolution: an agentic investigation workflow that runs diagnostics automatically when alerts fire, and a Kubernetes MCP App with skills that brings the same investigation capabilities into the AI tools and IDEs engineers already use, Claude, Cursor, VS Code, and any MCP-compatible client.

The Elastic Observability MCP App lets SREs investigate Kubernetes environments conversationally, with AI agents querying live data from Elasticsearch and surfacing fully interactive views directly in the tool: cluster health rollups, service dependency graphs, anomaly detail with actual versus typical values, blast radius analysis for node failures, and persistent alert rule management.

Elasticsearch stores all Kubernetes logs and metrics at scale with 2.5x better storage efficiency than other observability vendors, ensuring engineers have access to the full operational context needed to investigate incidents. Whether the agentic workflow delivers a confirmed root cause or a structured starting point for continued investigation, SREs never start from scratch.

“Engineers who get paged at 3 a.m. don’t want to start a new investigation from scratch, they want answers,” said Bahaaldine Azarmi, GM, Observability at Elastic. “With this release, Elastic kicks off the investigation the moment an alert fires, so teams reach resolution faster and with more confidence. And because it runs inside the tools engineers already use, there’s no context switch and no new interface to learn.”

More about

Don't miss