OpenAI Releases GPT-5.3-Codex, a New Codex Model for Agent-Style Development
Published on by Paul Redmond
OpenAI introduced GPT‑5.3‑Codex, a Codex model focused on agent-style development workflows where the model can use tools, operate a computer, and complete longer tasks end-to-end. OpenAI says GPT‑5.3‑Codex runs 25% faster for Codex users and is available to paid ChatGPT plans across the Codex app, CLI, IDE extension, and web, with API access planned once it’s safely enabled.
- Agentic coding model aimed at longer, tool-using workflows
- 25% faster interactions for Codex users (per OpenAI)
- Used internally to help debug training and support deployment (per OpenAI)
- Stronger performance on coding and computer-use benchmarks (details below)
- More interactive supervision in the Codex app (frequent updates + “steering”)
- First OpenAI model classified as “High capability” for cybersecurity tasks under OpenAI’s Preparedness Framework
What’s New
Frontier coding benchmarks (SWE‑Bench Pro, Terminal‑Bench 2.0)
OpenAI reports state-of-the-art performance on SWE‑Bench Pro (a multi-language software engineering benchmark) and a sizable jump on Terminal‑Bench 2.0, which measures the terminal skills a coding agent needs.
Stronger computer-use performance (OSWorld‑Verified)
OpenAI also highlights improved “computer use” performance on OSWorld‑Verified, a benchmark where models use vision to complete tasks in a desktop environment. OpenAI notes humans score around 72% on OSWorld‑Verified.
More interactive supervision in the Codex app
OpenAI describes GPT‑5.3‑Codex as more interactive in the Codex app, with more frequent updates while it works. Instead of waiting for a final answer, you can ask questions, discuss an approach, and adjust direction mid-task.
OpenAI also notes you can enable steering in the app under Settings → General → Follow-up behavior.
Used to help train and deploy itself
One of the more unusual details in the announcement is that OpenAI says early versions of GPT‑5.3‑Codex helped debug its own training run, support deployment, diagnose evaluation results, and assist with operational tasks like adapting harnesses and scaling GPU clusters as traffic changes.
Cybersecurity posture and staged access
OpenAI says GPT‑5.3‑Codex is the first model it classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and that it’s deploying additional mitigations and access controls as a result. Alongside the release, OpenAI also announced a new “Trusted Access for Cyber” pilot program.
Availability and infrastructure
OpenAI says GPT‑5.3‑Codex is available with paid ChatGPT plans anywhere Codex is available (app, CLI, IDE extension, and web), and that API access will follow once it’s safely enabled.
OpenAI also says GPT‑5.3‑Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems.
Benchmarks (OpenAI appendix)
OpenAI includes the following benchmark results in the release post. The table below is reproduced from OpenAI’s appendix values.
| Benchmark | GPT‑5.3‑Codex | GPT‑5.2‑Codex | GPT‑5.2 |
|---|---|---|---|
| SWE‑Bench Pro (Public) | 56.8% | 56.4% | 55.6% |
| Terminal‑Bench 2.0 | 77.3% | 64.0% | 62.2% |
| OSWorld‑Verified | 64.7% | 38.2% | 37.9% |
| GDPval (wins or ties) | 70.9% | – | 70.9% (high) |
| Cybersecurity Capture The Flag | 77.6% | 67.4% | 67.7% |
OpenAI notes the evaluations in the post were run with xhigh reasoning effort.
Upgrade Notes
OpenAI says GPT‑5.3‑Codex is available now in ChatGPT’s Codex surfaces, and that it is “working to safely enable API access soon.” If your workflow depends on API availability, keep an eye on OpenAI’s platform updates.
References