Traefik Triple Gate gains parallel safety pipelines, failover routing, and AI runtime controls

Traefik Labs has announced new capabilities that extend Traefik Hub’s Triple Gate architecture (API Gateway, AI Gateway, and MCP Gateway) with deeper runtime governance across the full AI workflow, including a composable multi-vendor safety pipeline with parallel guard execution, multi-provider failover routing, token-level cost controls, graceful error handling for agent-aware enforcement, IBM Granite Guardian integration, and a new Regex Guard capability that enables organizations to create custom guards.

These capabilities address a growing gap. Enterprises moving to autonomous agents face fragmented governance. CSP-native tools are limited to a single cloud, SaaS gateways route traffic through third-party infrastructure, and a growing number of “LLM proxies” and “MCP proxies” typically focus on a narrow layer of traffic.

An LLM proxy sees the model interaction but not the agent actions that follow, while an MCP proxy sees tool calls but not the LLM conversation that triggered them. Even as some begin to expand their scope, few offer composable multi-vendor safety, cost controls, resilience, and agent authorization within a unified architecture.

“You can’t govern the full AI workflow by looking at one layer at a time,” said Sudeep Goswami, CEO at Traefik Labs. “Enterprises need an infrastructure-native approach that enforces safety, cost control, resilience, and agent authorization from a unified and integrated platform they own and operate, across any environment. That’s what Traefik’s Triple Gate architecture makes possible, and today’s release takes it further.”

A composable safety pipeline that runs in parallel

Traefik Hub’s AI Gateway supports a multi-vendor safety pipeline, allowing organizations to choose from multiple guardrail providers and combine them. Total enforcement time equals the slowest guard, not the sum.

The pipeline spans four tiers:

  • Regex guard: A framework for organizations to write their own guards using regex-based pattern matching, at sub-millisecond speed with zero external dependencies. Teams define rules to block or mask content based on patterns they know best: Social Security numbers, credit card formats, API keys, or any pattern specific to their business. For well-understood patterns like PII, there’s no reason to pay the latency and cost of an AI model when a regex match is faster, cheaper, and deterministic.
  • Content guard (Microsoft Presidio): Global PII detection and masking with statistical NLP-based entity recognition, supporting both built-in and custom entity patterns.
  • LLM guard with NVIDIA NIMs: GPU-accelerated jailbreak detection, content safety across 22+ categories, and topic control provides semantic intelligence for threats that deterministic patterns can’t catch.
  • LLM guard with IBM Granite Guardian: IBM’s open-source safety models provide harm detection, jailbreak detection, topic control, hallucination detection, and RAG quality assessment, capabilities that other guard providers don’t yet offer.
  • Parallel guard execution: LLM-based guards, the heavyweight tiers that can take multiple seconds to execute, now run in parallel rather than in series. Guards are classified as critical (failure blocks and cancels) or optional (failure is logged). Multiple NVIDIA NIMs (jailbreak, content safety, topic control) and IBM Granite guards can all execute simultaneously, so total enforcement time equals the slowest guard, not the sum.

Operational controls: Resilience, cost control, and agent-aware enforcement

  • Failover router: Automatic failover across LLM providers and models via circuit breaker chain. The router switches both providers and models, enabling cost-optimized degradation while all safety policies remain enforced. Organizations can mix OpenAI, Anthropic, self-hosted models, and NVIDIA NIMs in a single failover chain, something not possible when governance is locked to a single CSP.
  • Token rate limiting and quota management: Tracks input, output, and total tokens independently, with rate limiting for spikes and quotas for hard budget caps. Per-user, per-team, per-endpoint, or per-API-key tracking via JWT claims. Proactive token estimation blocks abusive requests before the prompt reaches the LLM, rather than enforcing limits reactively after resources are consumed.
  • Error handling: When a guardrail blocks a request, traditional gateways return an HTTP 403 that breaks agent control flow and crashes multi-step workflows. Traefik Hub’s guardrails can now be configured to return structured, schema-compliant refusal responses (HTTP 200 with a refusal message) that agents and applications can process gracefully. Agents continue operating, middleware chains stay intact, and users see conversational refusals instead of technical errors. This is what makes the runtime governance agent-aware: enforcement that works with autonomous and agentic workflows rather than breaking them.

Traefik Hub’s Triple Gate is the unified infrastructure-layer approach that governs LLM content safety, cost, and resilience alongside agent authorization through Tools/Tasks/Transactions-Based Access Control (TBAC for MCP Gateway), on infrastructure you operate, from public cloud to air-gapped environments. It works with any agent platform (NemoClaw, LangChain, CrewAI, or custom) because it governs the traffic, not the runtime.

Traefik’s infrastructure-native approach is gaining traction with neoclouds, service providers, and enterprises building GPU-accelerated AI infrastructure. Organizations that have standardized on Traefik for application networking can add the AI Gateway and MCP Gateway capabilities through a single in-place upgrade, with no re-architecture, no traffic migration, and no additional proxies in the data path.

More about

Don't miss