OpenAI released GPT-5.3-Codex-Spark, a real-time coding model

OpenAI has released a research preview of GPT-5.3-Codex-Spark, an ultra-fast model for real-time coding in Codex. It is available to ChatGPT Pro users in the latest versions of the Codex app, the command-line interface, and the VS Code extension.

GPT-5.3-Codex-Spark

The model delivers over 1,000 tokens per second when served on ultra-low-latency hardware while remaining capable of handling real-world coding tasks.

“We’re sharing Codex-Spark on Cerebras as a research preview to ChatGPT Pro users so that developers can start experimenting early while we work with Cerebras to ramp up datacenter capacity, harden the end-to-end user experience, and deploy our larger frontier models,” the company said.

OpenAI describes GPT-5.3-Codex-Spark as a smaller version of GPT-5.3-Codex, optimized specifically for ultra-low-latency coding workflows.

Designed for interactive, in-editor iteration

Codex-Spark is optimized for interactive coding with Codex, enabling developers to make targeted edits, adjust logic, refine interfaces, and see results instantly. At launch, it is text-only and has a 128k context window. During the research preview, usage does not count toward standard limits and has its own rate limits that may adjust based on demand.

Developers can interact with the model in real time, redirect its output, interrupt tasks, and iterate with near-instant responses. Because Codex-Spark is tuned for speed, it makes minimal edits by default and does not automatically run tests unless explicitly requested.

End-to-end latency improvements across the stack

OpenAI implemented end-to-end latency improvements across its infrastructure by streamlining how responses stream between client and server, rewriting key parts of the inference stack, and revising session initialization so the first token appears sooner. They also introduced persistent WebSocket connections and optimized the Responses API, significantly reducing client/server overhead.

Codex-Spark runs on Cerebras’ Wafer Scale Engine 3, a purpose-built accelerator for low-latency inference that supports an ultra-low-latency serving tier alongside OpenAI’s existing infrastructure.

“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible—new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning,” said Sean Lie, CTO and co-founder of Cerebras.

OpenAI says the model has undergone baseline safety evaluations and is being deployed under the company’s standard safety processes.

Over time, OpenAI says it plans to explore combining real-time coding speed with longer-horizon reasoning capabilities in future Codex experiences.

More about

OpenAI released GPT-5.3-Codex-Spark, a real-time coding model

Designed for interactive, in-editor iteration

End-to-end latency improvements across the stack

Featured news

Resources

Don't miss