Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages
This week, Liquid AI released two new retrieval models.

This week, Liquid AI released two new retrieval models. They are LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M . Both hold 350M parameters. Both are the first bidirectional members of the LFM family. They build on LFM2.5-350M-Base , released in March. The pair targets fast multilingual and cross-lingual search across 11 languages. Their footprint is small enough to run almost anywhere. Both are available now on Hugging Face under the LFM Open License v1.0.
The two models share one backbone but represent text differently. LFM2.5-Embedding-350M is a dense bi-encoder. It turns each document into a single vector. Pick it when you want the fastest search and the smallest, cheapest index.
LFM2.5-ColBERT-350M is a late-interaction model. It converts each token into a vector rather than one vector per document. This lets it match queries word-by-word for higher accuracy and better generalization. The trade-off is a larger index. Pick it when accuracy matters more than storage. Its query length is capped at 32 tokens. It can also rerank a first-stage retriever’s results without building an index.
Both target short-context search. Good fits include product catalogs, FAQ knowledge bases, and support docs. Liquid AI positions both as a drop-in replacement for an existing RAG pipeline.
Both models start from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. Liquid AI applies a small set of bidirectional patches to the LFM2 architecture. These adapt it from a causal decoder to a bidirectional encoder.
In a causal setup, each token uses only itself and previous tokens. That suits left-to-right generation but is less natural for retrieval. The team replaces the causal attention mask with a bidirectional one. Now every token can attend to both left and right context. They also make the LFM2 short convolutions non-causal. These mix local information symmetrically around each token, not only from the past.
This preserves the LFM2 backbone’s efficiency while producing the full-context representations retrieval needs. Each model has 17 layers: 10 convolution, 6 attention, and 1 pooling or dense. Context length reaches 32,768 tokens, though documents are tuned to 512 tokens. From the shared encoder, the two models differ only in output. Embedding uses CLS-style pooling for one 1024-dim vector. ColBERT keeps 128-dim per-token embeddings for MaxSim late interaction.
Both models follow the same three-stage recipe:
Source: MarkTechPost