Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
This week, Cohere AI team shipped its first developer-facing coding model named ‘ North Mini Code ‘.

North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding">
This week, Cohere AI team shipped its first developer-facing coding model named ‘ North Mini Code ‘. ‘North Mini Code’ is open-weight and focused at software engineers. It is a mixture-of-experts (MoE) model with 30B total parameters. Only 3B of those parameters activate per token.
The release is positioned around “sovereign” AI. The idea is simple: run capable models on your own terms. Small, efficient coding models let teams self-host without large GPU clusters. North Mini Code targets that gap directly.
North Mini Code is a 30B-A3B parameter model. The A3B stands for three billion active parameters per forward pass. Cohere optimized it for three jobs: code generation, agentic software engineering, and terminal tasks . The model is text-in, text-out. There is no image or video input.
The context window is 256K tokens. Maximum output length is 64K tokens. Cohere lists a minimum hardware bar of one H100 at FP8. Weights ship under Apache 2.0 on Hugging Face. You can also reach it through the Cohere API, Model Vault, and OpenRouter.
North Mini Code is a decoder-only Transformer with sparse MoE layers. Its attention interleaves two types in a 3:1 ratio. Sliding-window attention uses RoPE for positions. Global attention uses no positional embeddings at all. The feed-forward block holds 128 experts. Eight experts activate per token. Each expert is an FFN with SwiGLU activation.
The router applies a sigmoid before top-k selection. A single dense layer sits before the sparse layers. That mix keeps active compute small while widening total capacity. Cohere released the weights in BF16.
Post-training ran in two phases. First came two-stage cascaded supervised fine-tuning (SFT). Then came reinforcement learning with verifiable rewards (RLVR). The post-training focused on agentic coding. The model also supports interleaved thinking and native tool use.
Cohere reports a 33.4 on the Artificial Analysis Coding Index. It describes this as a competitive position among similarly sized models. The company evaluated on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench v2. It also used Terminal-Bench Hard, SciCode, and LiveCodeBench v6.
The methodology is specific. SWE-Bench used the SWE-agent harness v1.1.0. Terminal-Bench v2 used a simple ReAct harness with one terminal tool. Terminal-Bench Hard used the Terminus-2 harness. Each benchmark ran with three seeds, then averaged. Sampling used temperature 1.0 and top_p 0.95.
Source: MarkTechPost