MiniMax M2.5 is a frontier‑level large language model released on February 12, 2026 by Shanghai‑based AI company MiniMax. It was launched just weeks after the company’s Hong Kong IPO and is explicitly positioned as a high‑end model that competes with leading systems like GPT‑5.2, Claude Opus 4.6, and GLM‑5, but at a radically lower cost. The headline promise is simple: you can run a powerful agent or assistant at around one dollar per hour of continuous operation, even at very high token throughput. This pricing and positioning make M2.5 especially attractive for startups and enterprises that want frontier‑grade capabilities without committing to the much higher per‑token costs of US or European providers. Under the hood, MiniMax M2.5 uses a Mixture‑of‑Experts architecture with about 230 billion total parameters and roughly 10 billion parameters active per token during inference. That sparse design means the model can deliver top‑end reasoning and coding ability while keeping hardware requirements far below what you would expect from a dense model of similar capability. Two main API variants are offered. The Lightning version is tuned for very high throughput and can sustain around 100 tokens per second per instance. The Standard version sacrifices a bit of peak speed but pushes cost efficiency even further, making it suitable for large fleets of background agents and batch jobs. On benchmarks, M2.5 lands firmly in the frontier bracket. It reaches 80.2 percent on SWE‑Bench Verified, a widely tracked benchmark for software‑engineering tasks that evaluate whether the model can read, modify, and fix real‑world codebases. On Multi‑SWE‑Bench, which stresses multi‑step reasoning across several issues in parallel, M2.5 scores 51.3 percent and takes first place among reported models at launch time. The model also scores 76.3 percent on BrowseComp, a test that measures how well an agent can read, navigate, and act on live web content while solving tasks. Taken together, these numbers show that MiniMax M2.5 is not just a chat model; it is explicitly tuned for agentic behavior and complex tool‑assisted workflows. The cost structure is one of the most striking aspects of the release. MiniMax emphasizes that a single M2.5 instance running continuously at 100 tokens per second can be operated for about one US dollar per hour. Internal estimates suggest that keeping four such instances online continuously for an entire year would cost on the order of ten thousand dollars. When you compare this with conventional per‑million‑token pricing from other providers, the difference is stark. For example, one popular frontier model charges roughly five dollars per million input tokens and twenty‑five dollars per million output tokens, while even the aggressively priced GLM‑5 sits at one dollar per million input and just over three dollars per million output. In practice, M2.5’s approach makes always‑on agents economically viable for mid‑sized companies rather than just tech giants. A key ingredient behind M2.5’s performance is MiniMax’s in‑house reinforcement‑learning framework called Forge. Forge is designed as an agent‑native RL system: instead of hard‑wiring training around a single agent scaffold, it decouples the training engine from the way agents are constructed. That makes it easier for the model to generalize across different tool interfaces, prompting styles, and orchestration patterns. Forge relies on large‑scale training across more than two hundred thousand real‑world environments, many of which mirror internal company workflows such as document analysis, office automation, and software maintenance. This rich environment mix helps the model learn behaviors that transfer cleanly into production settings. Several technical innovations inside Forge are highlighted by MiniMax. One is CISPO, short for Clipped Importance Sampling Policy Optimization, which adjusts how importance sampling is applied during RL updates. Instead of clipping token‑level gradients, Forge clips importance weights, allowing every token in a trajectory to contribute information while still controlling variance. In internal comparisons, this approach delivered roughly a two‑times speedup over a more conventional DAPO setup. Forge also uses asynchronous scheduling and tree‑structured sample merging to keep the hardware fully utilized, achieving around a forty‑times improvement in effective training throughput compared with earlier internal systems. Training timelines for the M2 family underscore this focus on efficiency. MiniMax reports that an earlier reasoning model, M1, was trained on 512 H800 GPUs in about three weeks for roughly five hundred thirty‑five thousand dollars in compute cost. The full M2.5 cycle, including large‑scale RL on Forge, is said to have completed in about two months. For a frontier‑class model, that is an unusually short and cost‑effective training schedule, suggesting that MiniMax’s emphasis on sparse architectures and aggressive RL optimization is paying off in practical terms. For customers, this efficiency shows up as lower pricing and faster iteration on new capabilities. In real‑world behavior, M2.5 shows several traits that matter for agentic use cases. Evaluations on internal office‑automation benchmarks, where models must manipulate documents, spreadsheets, and email‑like content, show a win rate close to sixty percent against a basket of mainstream competitors. The model often begins by decomposing a task into sub‑steps before writing any code or producing final text, which improves both accuracy and token efficiency. For example, on SWE‑Bench tasks, M2.5 uses fewer tokens on average than its M2.1 predecessor while still improving solve rates, indicating better planning rather than just more verbose trial‑and‑error. MiniMax is also pushing M2.5 through a consumer‑facing platform called MiniMax Agent, where users can assemble their own specialized agents by combining tools, instructions, and memory settings. Early usage numbers show more than ten thousand custom agents created shortly after launch. Feedback from independent open‑source projects such as OpenHands notes that M2.5 can still be occasionally sloppy—pushing a fix to the wrong Git branch or producing slightly off‑format outputs—but also confirms that, on balance, it delivers strong performance for the cost. For many teams, that trade‑off is acceptable: they can add guardrails and validation layers on top while still benefiting from very low per‑task expenses. As of its release, MiniMax frames M2.5 as a foundation for long‑running, tool‑using AI agents in coding, office productivity, and operations. The company has signaled plans to publish a deeper technical report on Forge and its scaling laws, including how performance changes as more environments and trajectories are added. The open questions now are whether MiniMax can keep scaling this RL‑centric approach in step with larger competitors, and whether cost‑optimized, agent‑first models like M2.5 will define the next phase of AI adoption. For organizations that care more about deploying many reliable agents than about squeezing out the very last percentage point on synthetic benchmarks, MiniMax M2.5 already looks like a compelling and economically sustainable choice.

