MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost
Chinese AI startup MiniMax releases its highly anticipated M3 large language model, pairing frontier-tier coding and agentic performance with a 1-million-token context window and native multimodality at a fraction of the cost of leading proprietary models.

['Big news in enterprise AI broke over the weekend as Chinese AI startup MiniMax released its highly anticipated M3 large language model on Sunday evening Eastern time, pairing frontier-tier coding and agentic performance with a 1-million-token context window and native multimodality for a fraction of the cost of leading proprietary models, with pricing starting at just $20 per month under its new subscription token plans. The company\'s leadership also announced plans to deliver the model under an open source license including "open weights," allowing for full enterprise downloading and customizability free-of-charge, coming sometime in the next 10 days. For now, it is available via the MiniMax API at a special discounted price of $0.3 per 1 million input tokens and $1.20 per million output tokens (on fresh cache) for the next week — beating proprietary U.S.
giants like Google, OpenAI and Anthropic handily on cost, while also eclipsing the performance of the latest models from the former two on selected benchmarks. Even at its full price of $0.6/$2.40 per million input/output tokens, MiniMax-M3 remains at just 8-20% the cost of the leading, proprietary U.S. models.
The traditional matrix governing large language model development has long dictated a rigid choice: software developers can either access top-tier closed-source intelligence behind restrictive APIs, or deploy nimble, cost-effective open models that falter on multi-step reasoning, dense coding tasks, and massive data sequences. MiniMax-M3 fundamentally upends this paradigm. By unifying these two historically separated frontier capabilities, M3 introduces a level of comprehensive utility previously restricted to expensive, closed-source ecosystems, effectively shifting the baseline of open-weights systems while drastically minimizing the operational compute footprint required to execute complex development loops.
The New MiniMax Sparse Attention (MSA) technique helps keep the model\'s cost low. At the core of the model\'s efficiency lies an architectural departure from classic Transformer networks. Standard attention mechanisms scale quadratically ($O(N^2)$) , meaning computational and financial costs explode as text inputs lengthen.
To combat this "inherent flaw," the engineering team implements MiniMax Sparse Attention (MSA), a clean, extensible sparse attention blueprint. Rather than taking a pretrained text network and fusing it with a separate vision model, MiniMax engineered M3 as a natively multimodal system from "Step Zero". The company overhauled its data ingest machinery to blend naturally interleaved sequences of text, images, and visual components, scaling the total pretraining corpus beyond 100 trillion tokens.
Source: VentureBeat