OpenAI joins the race in AI-assisted code security

OpenAI introduced Codex Security⁠, an AI agent that reviews codebases to find, verify, and help fix software vulnerabilities. The launch comes a few weeks after rival Anthropic unveiled its Claude Code Security tool.

Codex Security

Codex Security (Source: OpenAI)

The feature is available in research preview via Codex Web for ChatGPT Pro, Enterprise, Business, and Edu customers, with free access for the next month. Previously known as Aardvark, Codex Security launched last year in a private beta with a small group of customers.

The company says early internal deployments uncovered serious vulnerabilities that were fixed quickly, while external testing helped refine onboarding and context-sharing for better results.

“Over the last 30 days, Codex Security scanned more than 1.2 million commits across external repositories in our beta cohort, identifying 792 critical findings and 10,561 high-severity findings. Critical issues appeared in under 0.1% of scanned commits, showing that the system can identify security-impacting issues in large volumes of code while minimizing noise to reviewers,” the company said in the announcement.

OpenAI expects detection quality and signal-to-noise ratios to continue improving as adoption grows.

How it works in practice

Using the threat model as context, the system searches for vulnerabilities and prioritizes findings based on likely real-world impact. Instead of relying only on pattern matching, it validates issues in sandboxed environments to reduce false positives.

When configured with an environment tailored to a project, the tool tests potential issues against the running system. This deeper validation further lowers false positive rates and, in some cases, produces working proof-of-concepts that help security teams confirm risk and plan remediation.

For confirmed issues, Codex Security suggests patches that align with existing code behavior and system design. This approach helps teams review and merge fixes with lower regression risk.

“Users can filter findings to focus on issues that matter most to their team and have the highest security impact,” the company notes.

Don't miss