OpenAI Launches GPT-5.4: One Million Token Context, Native Computer Use, and 47% Fewer Tokens
OpenAI's newest model ships in three tiers — standard, Thinking, and Pro — with a 1M token context window, native computer control, and Excel/Sheets integrations. Here's what developers and businesses need to know.
OpenAI shipped GPT-5.4 on Thursday, March 5th — and for once, the release notes match the hype. The new model arrives in three flavors, breaks records on multiple benchmarks, and introduces a native Computer Use capability that lets the model control a computer like a human operator. Here's what matters.
Three Versions, One Release
GPT-5.4 comes in three tiers:
- GPT-5.4 — the standard model, available via API
- GPT-5.4 Thinking — a reasoning variant with extended chain-of-thought; available to all ChatGPT paid subscribers (Plus and above)
- GPT-5.4 Pro — optimized for the most demanding tasks; reserved for ChatGPT Pro ($200/month) and Enterprise users
All three are available through OpenAI's API. The Codex coding platform gains access to both the standard and Pro variants. Free ChatGPT users will encounter GPT-5.4 when the platform auto-routes their queries to it.
What's Actually New
1 Million Token Context Window
The API version supports up to 1 million tokens of input — by far the largest context window OpenAI has offered. That's enough to load entire codebases, lengthy legal documents, or months of conversation history into a single prompt. There's a catch: pricing doubles once you exceed 272,000 tokens, so it's more of a capability ceiling than a flat-rate feature.
47% Token Efficiency Gains
OpenAI reports that GPT-5.4 solves equivalent tasks using significantly fewer tokens than its predecessors — up to 47% fewer on some workloads. For developers paying per token, this alone could meaningfully reduce costs on high-volume applications.
Native Computer Use Mode
GPT-5.4 ships with a "native" Computer Use mode through the API and Codex. The model can navigate a user's operating system, interact with applications, and execute tasks across software — similar to Anthropic's Computer Use capability, but now integrated directly into the model rather than layered on top. This is a significant step toward practical desktop automation.
Excel and Google Sheets Integrations
OpenAI is rolling out ChatGPT integrations that let GPT-5.4 plug directly into Microsoft Excel and Google Sheets cells and formulas. Users can run analysis, generate content, and automate tasks from within their spreadsheets. This follows similar moves from Anthropic's Claude for Finance, and signals a continued push into enterprise knowledge-work automation.
New Tool Search API
The API's tool-calling system has been redesigned. Previously, every request required listing all available tool definitions in the system prompt — expensive as tool counts grew. The new Tool Search system lets the model look up tool definitions on demand, reducing token overhead and latency in agentic pipelines with large tool libraries.
Benchmark Results
The numbers OpenAI is citing:
| Benchmark | Result |
|---|---|
| GDPval (knowledge work) | 83% — record score |
| OSWorld-Verified (computer use) | Record |
| WebArena Verified (computer use) | Record |
| APEX-Agents (law & finance) | #1 on Mercor's leaderboard |
| Hallucination reduction vs GPT-5.2 | 33% fewer errors in individual claims |
| Overall response accuracy improvement | 18% fewer errors overall |
Mercor CEO Brendan Foody, whose platform administers the APEX-Agents benchmark for professional skills in law and finance, described GPT-5.4 as excelling at "long-horizon deliverables such as slide decks, financial models, and legal analysis."
Safety: Chain-of-Thought Monitoring
OpenAI included a new evaluation specifically targeting chain-of-thought (CoT) faithfulness — whether a reasoning model's visible thought process actually represents what it's doing. AI safety researchers have flagged this as a risk: a model could, in principle, "think" one thing internally while showing another. OpenAI's tests on GPT-5.4 Thinking suggest the model is unlikely to hide its reasoning, though the researchers note this is an ongoing area of study.
Context: A Fast-Moving Week
GPT-5.4 follows GPT-5.3 Instant by just two days. Anthropic reported March 2nd as its largest single day ever for new user sign-ups — context that explains the rapid release cadence. The competition between OpenAI and Anthropic is now measured in days, not quarters.
Bottom Line for Developers
If you're building agentic pipelines, the combination of 1M token context, Tool Search, and native computer use makes GPT-5.4 a meaningful upgrade over previous OpenAI models. The token efficiency gains are particularly relevant for production systems. The Excel/Sheets integrations are less interesting for developers but signal where OpenAI is targeting enterprise adoption.
Sources: