LlamaFirewall: Open-source framework to detect and mitigate AI centric security risks

LlamaFirewall is a system-level security framework for LLM-powered applications, built with a modular design to support layered, adaptive defense. It is designed to mitigate a wide spectrum of AI agent security risks including jailbreaking and indirect prompt injection, goal hijacking, and insecure code outputs.

LlamaFirewall

Why Meta created LlamaFirewall

LLMs are moving far beyond simple chatbot use cases and becoming core components of high-trust, autonomous systems. With this growing sophistication comes a corresponding rise in security risk. “LLMs have advanced to the point where they can function as autonomous agents, yet existing safety measures were never designed with this level of capability in mind,” Sahana Chennabasappa, Security Engineer at Meta, told Help Net Security. That disconnect is creating dangerous blind spots in how organizations secure these systems.

One particularly concerning area is the use of LLMs in coding applications. “Coding agents that rely on LLM-generated code may inadvertently introduce security vulnerabilities into production systems,” Chennabasappa warned. “Misaligned multi-step reasoning can also cause agents to perform operations that stray far beyond the user’s original intent.” These types of risks are already surfacing in coding copilots and autonomous research agents, she added, and are only likely to grow as agentic systems become more common.

Yet while LLMs are being embedded deeper into mission-critical workflows, the surrounding security infrastructure hasn’t kept pace. “Security infrastructure for LLM-based systems is still in its infancy,” Chennabasappa said. “So far, the industry’s focus has been mostly limited to content moderation guardrails meant to prevent chatbots from generating misinformation or abusive content.”

That approach, she argued, is far too narrow. It overlooks deeper, more systemic threats like prompt injection, insecure code generation, and abuse of code interpreter capabilities. Even proprietary safety systems that hardcode rules into model inference APIs fall short, according to Chennabasappa, because they lack the transparency, auditability, and flexibility needed to secure increasingly complex AI applications.

In response, Chennabasappa and her team have developed LlamaFirewall, a new system-level security architecture designed specifically for LLM-based agents. “LlamaFirewall orchestrates defenses in tandem with guardrails to address emerging threats that traditional chatbot-centric safeguards simply weren’t built to handle,” she explained.

What makes LlamaFirewall unique

Chennabasappa explains that LlamaFirewall includes three guardrails tailored to the needs of LLM agent workflows that fall into two categories: prompt injection/agent misalignment and insecure/dangerous code. The three guardrails include:

1. PromptGuard 2, a universal jailbreak detector that detects direct jailbreak attempts with high accuracy and low latency and operates in real-time on user prompted and untrusted data sources.

2. Agent Alignment Checks, a chain-of-thought auditor that inspects agent reasoning for prompt injection and goal misalignment (This is the first open source guardrail to audit an LLM chain-of-thought in real time intended for injection defense to ensure an AI agent’s plans haven’t been hijacked by an adversarial input.)

3. CodeShield, a low latency online static analysis engine that detects insecure code outputs from LLMs, safeguarding against potential vulnerabilities. We previously released CodeShield as part of the Llama 3 launch, and now include it in this unified framework. Along with all these inbuilt scanners, LlamaFirewall also provides customizable regex and LLM based checks which can be configured according to the specific application threat model and use case.

“LlamaFirewall incorporates these guardrails into a unified policy engine. With LlamaFirewall, developers can construct custom pipelines, define conditional remediation strategies, and plug in new detectors. Like Snort, Zeek, or Sigma in traditional cybersecurity, LlamaFirewall aims to provide a collaborative security foundation—one where researchers, developers, and operators can share policies, compose defenses, and adapt to new threats in real time,” Chennabasappa said.

LlamaFirewall is designed with flexibility in mind, allowing it to work across a wide range of AI systems regardless of the underlying agentic framework. “It can be used with any AI system, open or closed, that allows developers to incorporate additional security mechanisms,” said Chennabasappa, emphasizing the tool’s broad applicability.

Positioned as a security-focused, open-source solution, LlamaFirewall takes a layered defense-in-depth approach. According to Chennabasappa, this strategy “draws on Meta’s extensive experience in large-scale systems and production environments to help ensure the secure development of AI applications and agents.”

Unlike proprietary tools that can limit visibility and customization, LlamaFirewall embraces openness. “Its open-source nature provides a transparent and extensible platform for community-built plugins, rules, and detectors,” Chennabasappa noted, highlighting how this transparency supports greater trust and adaptability in AI security practices.

Future plans and download

While LlamaFirewall currently focuses on prompt injection and insecure code generation, the developers see potential to broaden its scope to encompass other high-risk behaviors such as malicious code execution and unsafe tool-use to enable more comprehensive protection across the agent lifecycle.

LlamaFirewall is available for free on GitHub.

Must read:

Subscribe to the Help Net Security ad-free monthly newsletter to stay informed on the essential open-source cybersecurity tools. Subscribe here!

More about

LlamaFirewall: Open-source framework to detect and mitigate AI centric security risks

Why Meta created LlamaFirewall

What makes LlamaFirewall unique

Future plans and download

Featured news

Resources

Don't miss