AI Models

Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

AI News Desk

MarkTechPost

Jun 13, 2026

3 min read

This week, Moonshot AI released Kimi K2.7-Code .

Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6">

This week, Moonshot AI released Kimi K2.7-Code . It is a coding-focused, agentic model. The model weights ship on Hugging Face under a Modified MIT license. You can also reach it through the Kimi API and Kimi Code.

K2.7-Code targets long-horizon software engineering, not general chat. It plans, edits, runs tools, and debugs across many steps. Moonshot pairs the model with a subscription coding platform around it.

K2.7-Code is a Mixture-of-Experts model. It holds 1T total parameters and activates 32B per token. The design uses 384 experts, with 8 selected per token and 1 shared. It has 61 layers, including 1 dense layer.

Attention uses MLA, and the feed-forward path uses SwiGLU. A MoonViT vision encoder adds 400M parameters for image and video input. The model ships with native INT4 quantization. The context window is 256K tokens (262,144).

Two constraints matters: Thinking mode is mandatory; disabling it returns an API error. Sampling is fixed: temperature 1.0, top_p 0.95, n 1, penalties 0.0. Default max output is 32,768 tokens.

You can self-host with vLLM, SGLang, or KTransformers. The Hugging Face repository is large, roughly 595 GB on disk. This is a server-class deployment target, not a laptop model.

Moonshot team published six benchmark rows. They compare K2.7-Code against K2.6, GPT-5.5, and Claude Opus 4.8. K2.7-Code beats K2.6 on every row. The largest coding jump is Kimi Code Bench v2, from 50.9 to 62.0.

K2.7-Code does beat Opus 4.8 on MCP Mark Verified, 81.1 versus 76.4. It also lands close to GPT-5.5 on MLS Bench Lite. K2.7-Code ran in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.

Moonshot team reports about 30% lower reasoning-token usage than K2.6. It frames this as ‘less overthinking.’

Reasoning tokens bill as output tokens on most price cards. Agentic coding runs hundreds or thousands of steps. Each plan, retry, and verification pays the thinking cost again. A 30% cut compounds across a long run.

The effect lands in three places at once. First, lower output-token cost per task. Second, faster steps, which helps interactive CLI sessions. Third, more steps before hitting context limits.

Share this article

X LinkedIn Telegram

Source: MarkTechPost