Meet ZAYA1-8B, a Super Efficient, Open Reasoning Model Trained on AMD Instinct MI300 GPUs
Zyphra releases ZAYA1-8B, an open, efficient reasoning model with 8 billion parameters, trained on AMD Instinct MI300 GPUs, achieving competitive performance with far fewer parameters.

['Even as leading AI providers like OpenAI and Anthropic battle over the compute to train and release ever larger, more powerful models, other labs are going in a different direction — pursuing the development of smaller, more efficient models and often open sourcing them. The latest worth paying attention to comes from the lesser-known Palo Alto startup Zyphra, which this week released its new reasoning, mixture-of-experts (MoE) language model, ZAYA1-8B, with just over 8 billion parameters and only 760 million active — far fewer than the trillions estimated for the likes of the big labs.', "Yet, ZAYA1-8B retains competitive performance on third-party benchmarks against GPT-5-High and DeepSeek-V3.2. It can be downloaded from Hugging Face now free of charge under a permissive, standard, enterprise-friendly Apache 2.0 license — and enterprises and indie developers can begin using and customizing it immediately to suit their needs.
Individual users can also test it themselves here free at Zyphra Cloud, the startup's inference solution. But the real headline is what ZAYA1-8B was trained on: a full stack of AMD Instinct MI300 graphics processing units (GPUs), the rival to Nvidia GPUs released by AMD nearly three years ago, and which shows that this platform is capable of producing useful models and is a viable alternative to the preferential position Nvidia has maintained in recent years among AI model developers.", "The 'intelligence density' touted by Zyphra is the result of what they describe as a 'full-stack innovation' approach, spanning architecture, pretraining, and reinforcement learning (RL). ZAYA1-8B is built on Zyphra’s proprietary MoE++ architecture, described in a technical report released by the lab.
This architecture introduces three fundamental changes to the standard Transformer architecture that gave rise to large language models (LLMs) and the entire generative AI era: Compressed Convolutional Attention (CCA), the ZAYA1 MLP Router, and Learned Residual Scaling.", "A critical differentiator for ZAYA1-8B is that reasoning was integrated from the start of pretraining, rather than being 'bolted on' during post-training. To handle long chain-of-thought (CoT) traces that would otherwise exceed the initial 4K pretraining context, Zyphra developed Answer-Preserving (AP) Trimming. The model’s most significant performance leap comes from Markovian RSA, a novel test-time compute (TTC) methodology.
This allows the 700M active parameter ZAYA1-8B to achieve a 91.9% score on AIME '25, closing the gap with models that have 30 to 50 times its active parameter count.", "Benchmarks show a remarkably performant small model that punches above its weight class. Zyphra is positioning ZAYA1-8B as a 'punch above its weight' model for developers who need high-tier reasoning without the latency or cost of massive frontier models. It beats Qwen3.5-4B and Gemma-4-E4B on math and code benchmarks.
Source: VentureBeat