MiniMax M2.5: Open-Source Coding Agent at 1/20th the Cost of Claude Opus
MiniMax released M2.5 on February 12, 2026 — a 230B open-weight model that matches Claude Opus 4.6 on coding benchmarks while costing just $1 per hour of continuous inference. Here's what makes it remarkable.
MiniMax M2.5: Open-Source Coding Agent at 1/20th the Cost of Claude Opus
For years, developers have faced a stark choice: pay premium prices for state-of-the-art proprietary models, or accept compromises with cheaper open-source alternatives. MiniMax M2.5, released on February 12, 2026, is a direct challenge to that tradeoff.
The model — released with full weights under an MIT license — delivers coding and agentic performance that benchmarks favorably against Claude Opus 4.6 and GPT-5.3-Codex, at a fraction of the cost. The headline number: $1 per hour of continuous inference at 100 tokens per second. At 50 tokens/s, it drops to $0.30/hour. MiniMax describes this as "intelligence too cheap to meter."
Architecture: 230B Parameters, Only 10B Active
M2.5 is built on a Mixture of Experts (MoE) architecture. The total parameter count is 230 billion, but during inference the model only activates 10 billion parameters at a time. This sparse design is the core reason for its speed and cost profile: the model stores a massive amount of learned knowledge in its dormant parameters, while executing tasks with the efficiency of a much lighter system.
The weights are available under an MIT license — meaning full commercial use and self-hosting are permitted. For organizations concerned with data privacy or vendor lock-in, that matters enormously.
Forge RL and the Spec-Writing Tendency
The most technically interesting element of M2.5 is its training methodology: the Forge RL framework, a reinforcement learning approach designed specifically for coding and agent scenarios rather than general-purpose tasks.
The model was trained across more than 200,000 real-world environments, spanning web, Android, iOS, and desktop applications, covering the entire development lifecycle from greenfield system design to code review. Languages include Python, TypeScript, Go, Rust, Kotlin, Java, C, C++, PHP, Dart, Ruby, and Lua.
A notable emergent behavior from training is what MiniMax calls "Spec-writing tendency": before generating code, M2.5 autonomously decomposes and plans the features, structure, and UI design of a project — thinking like a software architect rather than diving straight into implementation. This was not explicitly programmed; it emerged from the reinforcement learning process.
Benchmark Performance
On SWE-Bench Verified (a standard for real-world software engineering tasks), M2.5 scores 80.2% — competitive with the top proprietary models. On Multi-SWE-Bench it reaches 51.3%, and on BrowseComp (web search and research tasks with context management) it achieves 76.3%.
Speed is also a highlight. M2.5 completes SWE-Bench evaluations 37% faster than its predecessor M2.1, matching the inference speed of Claude Opus 4.6.
On coding agent harnesses specifically:
- On Droid: 79.7% (M2.5) vs. 78.9% (Opus 4.6)
- On OpenCode: 76.1% (M2.5) vs. 75.9% (Opus 4.6)
These are narrow margins, but the directionality is clear: an open-weight model is now trading blows with the best proprietary coding models — at a cost structure that is an order of magnitude lower.
What This Means for Developers
The practical implications are significant:
Self-hosted agentic infrastructure becomes viable. With M2.5's weights available under MIT and an efficient MoE architecture, teams can run a SOTA-class coding agent on their own infrastructure without per-token billing from a third-party API.
Agent economics change. Long-running agentic workflows — the kind that might spin for hours doing iterative code generation, testing, and debugging — become dramatically cheaper. At $0.30/hour for a background agent, this changes what's worth automating.
Open-source catches up to closed. The performance gap between open-weight and proprietary frontier models continues to close. M2.5 is arguably the strongest evidence yet that open models can compete at the highest tier of coding tasks.
Caveats
M2.5 is specialized. It excels at coding, agentic tool use, search, and office tasks. On abstract reasoning or creative writing benchmarks, it does not claim the same dominance. And as with all benchmark comparisons, real-world performance will depend heavily on the specific use case and infrastructure setup.
Getting Started
MiniMax M2.5 is available via the MiniMax API as well as through the open weights for self-hosting. The MIT license makes it straightforward to integrate into commercial products.
For teams building agentic workflows — especially in software development contexts — M2.5 is worth serious evaluation. The combination of frontier-class performance, open weights, and minimal cost is a rare combination.
Sources: MiniMax M2.5 announcement · i-scoop analysis · Artificial Analysis