AI Models

JetBrains Unveils Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

AI News Desk

MarkTechPost

Jun 02, 2026

2 min read

JetBrains has released Mellum2, an open-source 12B Mixture-of-Experts model designed for fast, specialized tasks in software engineering, under the Apache 2.0 license.

JetBrains Unveils Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

JetBrains has released Mellum2, the successor to its completion-focused 4B dense model, Mellum. This new general-purpose model is specialized in software engineering, covering a range of tasks including code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance. The JetBrains team positions Mellum2 as a 'focal model' — a fast, specialized component inside larger AI systems, not a standalone replacement for frontier models.

It uses a Mixture-of-Experts (MoE) architecture with 12B total parameters and 2.5B active parameters per token. This architecture allows only a subset of parameters to run on each token, keeping per-token compute equivalent to a 2.5B dense model while providing higher capacity for specialization. Mellum2 was pre-trained on approximately 10.6 trillion tokens through a three-phase curriculum, progressively shifting from diverse web content toward curated code and mathematical content.

The model was trained using the Muon optimizer under FP8 hybrid precision with a Warmup-Hold-Decay learning rate schedule with linear decay to zero. After pre-training, the base model's context window was extended to 128K tokens using a layer-selective YaRN method before post-training began. The JetBrains team released six checkpoints covering the full training pipeline, including two post-training stages: supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR) on various tasks.

The Instruct variant of the model answers directly, without an externalized chain of thought, making it suitable for low-latency tasks. The Thinking variant emits an explicit reasoning trace before its final answer, making it more suitable for complex debugging, multi-step planning, or agentic flows. According to JetBrains, Mellum2 is designed for efficiency in component roles, not frontier-level capability across all benchmarks.

The model is available for use, with installation instructions provided for serving the Instruct variant with vLLM and enabling tool-calling with the hermes parser. The model weights are available on Hugging Face, and a technical report can be found on arXiv.

Share this article

X LinkedIn Telegram

Source: MarkTechPost