When your AI assistant has the keys to production

Large language models in operational roles query telemetry, propose configuration changes, and in some deployments execute those changes against live infrastructure. Ticket drafting and alert summarization were the starting point. Vendors describe this work as autonomous remediation or self-healing infrastructure. A recent survey on agentic AI in network and IT operations gives it a more useful name: a confused-deputy problem waiting to happen.

agentic AI security llm

The confused-deputy problem in agentic AI security

The classic confused-deputy attack tricks an authorized program into misusing its privileges. Agentic operations create an ideal substrate for this kind of abuse. The agent holds legitimate access to change-management APIs, deployment pipelines, and network controllers. Its decisions are shaped by tickets, runbooks, chat transcripts, and log entries, which are the same artifacts an attacker can influence. Compromising the tool is unnecessary when an attacker can compromise the text the agent reads before it uses the tool.

Four attack categories targeting LLM operations

The survey catalogs several attack categories that deserve more attention. Prompt injection through operational artifacts is the most familiar: malicious instructions embedded in a ticket or wiki page that steer the agent toward an unsafe action. Subtler variants exist. Retrieval poisoning corrupts the runbooks and incident histories the agent consults, biasing its diagnoses toward attacker-chosen conclusions.

Retrieval jamming works in the opposite direction, flooding the knowledge base with blocker documents that trigger refusal loops and stall incident response when it is most needed. Telemetry manipulation works against LLM-driven operations agents. An attacker who can influence what metrics and logs say can steer mitigation decisions without touching the model.

These attacks are operationally dangerous because they do not look like attacks. They look like normal incident response that happens to go wrong.

The propose-commit split as an architectural defense

The defense proposed by the survey is architectural. The authors argue for a strict propose-commit split: the language model can reason, retrieve evidence, and draft change proposals, and it cannot execute writes. Every action that touches production passes through a non-bypassable gate the model has no authority over. The gate covers policy-as-code checks, invariant verification, human approval for high-blast-radius changes, and rollback-ready staged deployment.

The model’s job is to draft a diff. The gate’s job is to decide whether that diff is allowed to apply. Audit logs that are integrity-protected, so that post-incident forensics can reconstruct what happened, round out the control set.

The limits of prompt-based agentic AI security

This architecture matters because prompt-only defenses are brittle. Any system where the model’s text generation can directly cause production changes has built its security perimeter inside the most unpredictable component in the stack. The OWASP excessive-agency pattern, the survey notes, is in practice a failure to implement the propose-commit split cleanly.

The missing evidence for safe LLM autonomy

A measurement problem sits alongside the architectural one. Many claims about safe agentic operations cannot be falsified because the supporting evidence is missing. The survey identifies what evaluations should report: tool-call traces, gate-violation rates, behavior under adversarial inputs, refusal-storm rates under jamming attacks, and rollback completeness. Most current benchmarks omit these. A system that performs well on clean incidents may collapse the moment someone embeds a hostile instruction in a Jira ticket. Security teams evaluating agentic products should ask for adversarial evaluation data alongside success metrics on benign workloads.

Where autonomy earns trust and where it does not

The amount of autonomy an agent has is the amount of damage it can do when things go sideways. Read-only assistance is useful and low-risk. Bounded execution with strong gates is defensible. Open-ended self-healing across large production environments, without the verification scaffolding the survey describes, is a harder problem than current deployments make it sound, and claims about it deserve skepticism.

More about