AI Models

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context

AI News Desk

Hugging Face

May 14, 2026

2 min read

IBM releases two new Apache 2.0 multilingual embedding models, Granite Embedding Multilingual R2, with 97M and 311M parameters, offering improved retrieval quality and 32K context support.

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context

The field of multilingual embedding models has long been plagued by a fundamental trade-off: broad language coverage typically comes at the cost of model size, while smaller models often sacrifice language support. The newly released Granite Embedding Multilingual R2 models from IBM aim to narrow this gap considerably. IBM has introduced two new multilingual embedding models, both built on the ModernBERT architecture and released under the Apache 2.0 license.

The models support over 200 languages, with enhanced retrieval quality for 52 languages and programming code. They can handle context lengths of up to 32,768 tokens, a significant 64x increase over their predecessors. The models are compatible with popular frameworks such as sentence-transformers and transformers and can be easily integrated into LangChain, LlamaIndex, Haystack, and Milvus.

The standout model is the 97M-parameter granite-embedding-97m-multilingual-r2, which achieves a score of 60.3 on Multilingual MTEB Retrieval across 18 languages. This surpasses every other open multilingual embedding model under 100M parameters. The larger 311M model, granite-embedding-311m-multilingual-r2, scores 65.2 on the same benchmark, making it the second-best open model under 500M parameters.

The Granite Embedding Multilingual R2 models were built from the ground up using the ModernBERT encoder architecture. This architecture brings several practical benefits, including alternating attention lengths, rotary position embeddings, and Flash Attention 2.0 support. The models utilize new multilingual tokenizers, with the 311M model using the Gemma 3 tokenizer and the 97M model using a compact 180K-token vocabulary.

Both models were trained on a mixture of IBM-curated datasets, publicly available data, and internally generated or synthetic data. The models support cross-lingual code retrieval and were pretrained using GneissWeb, an IBM-curated dataset derived from publicly available web content. The 311M model supports Matryoshka Representation Learning, allowing for flexible dimension reduction without significant loss in retrieval quality.

This feature enables substantial reductions in storage and computation costs. The models are available on Hugging Face under the IBM Granite Embedding collection and come with detailed technical reports and deployment examples. Framework maintainers and users can access the models and begin integrating them into their projects.

Share this article

X LinkedIn Telegram

Source: Hugging Face