When AI agents go rogue, the fallout hits the enterprise

In this Help Net Security interview, Jason Lord, CTO at AutoRABIT, discusses the cybersecurity risks posed by AI agents integrated into real-world systems. Issues like hallucinations, prompt injections, and embedded biases can turn these systems into vulnerable targets.

Lord calls for oversight, continuous monitoring, and human-in-the-loop controls to combat these threats.

AI agents risks

Many AI agents are built on foundation models or LLMs. How do the inherent unpredictabilities of these models—like hallucinations or prompt injections—translate into risks when agents are embedded in real-world systems?

Foundation models and LLMs learn from vast amounts of data, meaning any underlying bias, low-quality input, or factual errors become embedded in their behavior. Over time, these inaccuracies can compound—especially as the model encounters more diverse contexts.

While the inherent unpredictability of LLMs may be useful for creative or conversational purposes, it can expose significant risks in production environments. Hallucinations (fabrications or incorrect statements) and prompt injections (malicious manipulations of the input) can introduce hidden vulnerabilities.

In a worst-case scenario, these issues become attack vectors that provide unintended access to critical systems or sensitive data. Even minor inconsistencies in an AI agent’s responses could compromise data integrity or open backdoors. That creates real risks ranging from unauthorized disclosure to system corruption—all of which can severely impact an organization’s security posture.

If AI agents are given access to enterprise systems, tools, or data, what kinds of new attack surfaces or supply chain risks do they introduce? Are we essentially creating new privileged identities?

Yes, AI agents effectively become a new class of privileged identities with potential access to sensitive information and critical business workflows. Because they can process commands, retrieve or modify data, and interface with other enterprise systems, a single breach of an AI agent’s credentials or logic can be as damaging as compromising an entire privileged user account.

What makes AI agents particularly challenging is their unpredictable behavior and difficulty to audit when compared to conventional scripts or applications. They rely on natural language inputs that could be manipulated—through prompt injection, adversarial examples, or misconfigured access controls—to perform unwanted or malicious actions.

Organizations should also consider AI-driven supply chain risks. When AI agents generate code or produce artifacts that feed into CI/CD pipelines, they can unintentionally embed vulnerabilities or propagate flawed logic throughout downstream systems. These hidden risks can persist until discovered by thorough audits or exploit attempts, emphasizing the need for strict governance, access control, and continuous monitoring of AI-generated outputs.

Do you foresee AI agents becoming targets of adversarial attacks in the same way humans are through phishing or social engineering? What might “agent-aware” adversaries look like?

Absolutely. AI agents will simply be added to the list of targets but instead of using tactics to manipulate humans like phishing emails, attackers may lever prompt injections by crafting malicious inputs designed to manipulate an agent’s behavior. These attacks exploit the agent’s trust in seemingly benign instructions or data, causing it to leak information, escalate privileges, or take harmful actions.

“Agent-aware” adversaries will study how specific agents interpret language, make decisions, and interact with tools. They might embed hidden commands in shared documents, chat messages, or API responses. These attacks won’t look like traditional exploits—they’ll look like normal conversations or routine data flows.

As AI agents take on more responsibility in enterprise systems—handling tickets, updating records, or managing infrastructure—attackers will focus more attention on manipulating the agents directly. This makes AI-specific security controls, such as input validation, behavior monitoring, and robust audit trails, critical for detecting and preventing these attacks.

What use cases for AI agents in cybersecurity excite you the most right now—and which do you think are still overhyped or premature?

AI has come a long way over the last few years. Threat detection and automated responses are two ways AI agents can be a real asset to a cybersecurity team. Agents that triage alerts, correlate signals across diverse tools, and provide enriched context can dramatically enhance an analyst’s efficiency. By augmenting existing SOC workflows, AI agents help staff focus on strategic tasks rather than sifting through raw data.

However, fully autonomous AI agents capable of making unilateral decisions in live production environments remain risky. Hallucinated or incorrect actions could shut down vital infrastructure or create backdoors without human operators catching the error in time. Oversight mechanisms, manual approvals, and robust guardrails are still essential.

Overhyped or premature uses include end-to-end “self-healing” systems with no human intervention. In dynamic security landscapes, ensuring accountability and mitigating unintended consequences requires a human review process. Pairing AI agents with human-in-the-loop strategies will likely remain the best practice for some time.

Looking ahead 3–5 years, what does a mature AI-agent-enabled SOC look like? What capabilities will be commonplace, and what risks will we still be grappling with?

We’ve seen so much growth in the last year alone that I imagine AI agents will be very impressive in another 3-5 years. I expect AI-agent-enabled SOC to operate with semi-autonomous agents embedded across the incident lifecycle. They’ll be able to triage alerts, generate incident reports, and implement remediation actions.

Cross-tool integration (e.g., vulnerability scanners, EDR solutions, or threat intel feeds) is another capability that will become developed over the coming years. Human analysts will be free to focus more on strategic analysis, fine-tuning the behavior of their AI agents, and attending to unique cybersecurity circumstances.

The increase in maturity will be met with equally mature attacks and cybersecurity risks. Maintaining a strong security posture will require not just advanced AI but also continuous validation, red-teaming, and careful governance. No tool is a magic bullet—especially in a constantly evolving threat landscape.

More about

When AI agents go rogue, the fallout hits the enterprise

Many AI agents are built on foundation models or LLMs. How do the inherent unpredictabilities of these models—like hallucinations or prompt injections—translate into risks when agents are embedded in real-world systems?

If AI agents are given access to enterprise systems, tools, or data, what kinds of new attack surfaces or supply chain risks do they introduce? Are we essentially creating new privileged identities?

Do you foresee AI agents becoming targets of adversarial attacks in the same way humans are through phishing or social engineering? What might “agent-aware” adversaries look like?

What use cases for AI agents in cybersecurity excite you the most right now—and which do you think are still overhyped or premature?

Looking ahead 3–5 years, what does a mature AI-agent-enabled SOC look like? What capabilities will be commonplace, and what risks will we still be grappling with?

Featured news

Resources

Don't miss