OpenAI Launches GPT-5.4: One Million Token Context, Native Computer Use, and 47% Fewer Tokens

OpenAI shipped GPT-5.4 on Thursday, March 5th — and for once, the release notes match the hype. The new model arrives in three flavors, breaks records on multiple benchmarks, and introduces a native Computer Use capability that lets the model control a computer like a human operator. Here's what matters.

Three Versions, One Release

GPT-5.4 comes in three tiers:

GPT-5.4 — the standard model, available via API
GPT-5.4 Thinking — a reasoning variant with extended chain-of-thought; available to all ChatGPT paid subscribers (Plus and above)
GPT-5.4 Pro — optimized for the most demanding tasks; reserved for ChatGPT Pro ($200/month) and Enterprise users

All three are available through OpenAI's API. The Codex coding platform gains access to both the standard and Pro variants. Free ChatGPT users will encounter GPT-5.4 when the platform auto-routes their queries to it.

What's Actually New

1 Million Token Context Window

The API version supports up to 1 million tokens of input — by far the largest context window OpenAI has offered. That's enough to load entire codebases, lengthy legal documents, or months of conversation history into a single prompt. There's a catch: pricing doubles once you exceed 272,000 tokens, so it's more of a capability ceiling than a flat-rate feature.

47% Token Efficiency Gains

OpenAI reports that GPT-5.4 solves equivalent tasks using significantly fewer tokens than its predecessors — up to 47% fewer on some workloads. For developers paying per token, this alone could meaningfully reduce costs on high-volume applications.

Native Computer Use Mode

GPT-5.4 ships with a "native" Computer Use mode through the API and Codex. The model can navigate a user's operating system, interact with applications, and execute tasks across software — similar to Anthropic's Computer Use capability, but now integrated directly into the model rather than layered on top. This is a significant step toward practical desktop automation.

Excel and Google Sheets Integrations

OpenAI is rolling out ChatGPT integrations that let GPT-5.4 plug directly into Microsoft Excel and Google Sheets cells and formulas. Users can run analysis, generate content, and automate tasks from within their spreadsheets. This follows similar moves from Anthropic's Claude for Finance, and signals a continued push into enterprise knowledge-work automation.

New Tool Search API

The API's tool-calling system has been redesigned. Previously, every request required listing all available tool definitions in the system prompt — expensive as tool counts grew. The new Tool Search system lets the model look up tool definitions on demand, reducing token overhead and latency in agentic pipelines with large tool libraries.

Benchmark Results

The numbers OpenAI is citing:

Benchmark	Result
GDPval (knowledge work)	83% — record score
OSWorld-Verified (computer use)	Record
WebArena Verified (computer use)	Record
APEX-Agents (law & finance)	#1 on Mercor's leaderboard
Hallucination reduction vs GPT-5.2	33% fewer errors in individual claims
Overall response accuracy improvement	18% fewer errors overall

Mercor CEO Brendan Foody, whose platform administers the APEX-Agents benchmark for professional skills in law and finance, described GPT-5.4 as excelling at "long-horizon deliverables such as slide decks, financial models, and legal analysis."

Safety: Chain-of-Thought Monitoring

OpenAI included a new evaluation specifically targeting chain-of-thought (CoT) faithfulness — whether a reasoning model's visible thought process actually represents what it's doing. AI safety researchers have flagged this as a risk: a model could, in principle, "think" one thing internally while showing another. OpenAI's tests on GPT-5.4 Thinking suggest the model is unlikely to hide its reasoning, though the researchers note this is an ongoing area of study.

Context: A Fast-Moving Week

GPT-5.4 follows GPT-5.3 Instant by just two days. Anthropic reported March 2nd as its largest single day ever for new user sign-ups — context that explains the rapid release cadence. The competition between OpenAI and Anthropic is now measured in days, not quarters.

Bottom Line for Developers

If you're building agentic pipelines, the combination of 1M token context, Tool Search, and native computer use makes GPT-5.4 a meaningful upgrade over previous OpenAI models. The token efficiency gains are particularly relevant for production systems. The Excel/Sheets integrations are less interesting for developers but signal where OpenAI is targeting enterprise adoption.

Sources: