Liquid AI releases smallest AI model yet, LFM2.5-230M, for data extraction and local deployment
Liquid AI's LFM2.5-230M model outperforms larger models in data extraction and can run on local devices.

Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model yet, LFM2.5-230M. This 230-million-parameter foundation model is designed for on-device agentic workflows and can run nearly 'anywhere.' According to Liquid, it outperforms models more than 4X its size on selected benchmarks, specifically doing better at data extraction than the 800 million parameter count Alibaba Qwen3.5-0.8B (Instruct) and 1-billion parameter Google Gemma 3 1B. The model targets developers and engineers building lightweight data extraction pipelines and autonomous edge systems.
Operating under a dual-use commercial license, the model remains free for individuals and companies generating less than $10 million in annual revenue, while requiring a paid enterprise agreement for larger corporations. This release distinguishes itself from other small AI models by utilizing the LFM2 architecture to achieve high inference speeds without the massive memory overhead typical of parameter-heavy transformers. Liquid AI's launch of LFM2.5-230M signals a pivotal shift toward architectural efficiency over brute-force scaling.
The LFM2.5-230M model diverges from standard transformer architectures, relying instead on the LFM2 framework. This architecture functions as a hybrid system, interleaving gated short-range convolutions with grouped-query attention to process information efficiently. The model supports an expansive 32K context window, allowing it to ingest substantial documents or continuous streams of robotic telemetry.
When analyzing the performance charts provided in the release, the architectural efficiency becomes visually apparent. The model maintains a memory footprint of under 400MB while achieving prefill and decode speeds that outpace comparable models like Gemma 3 1B IT and Granite 4.0-H-350M. On a Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 CPU, the model reaches a decode speed of 213 tokens per second.
Even on a highly constrained Raspberry Pi 5, the model maintains a decode rate of 42 tokens per second. To understand why a 230-million-parameter model is necessary, one must look at how enterprises currently manage data. Organizations have traditionally relied on rigid, rule-based Extract, Transform, Load (ETL) scripts to move and process data.
However, these legacy systems are notoriously brittle; a simple change in a document's layout or a schema update can break the entire pipeline. For enterprises, using a massive flagship model like Claude Opus 4.6 (which costs $5.00 per million input tokens) to parse routine invoices, format addresses, or route telemetry data is economically unviable. This is where models like LFM2.5-230M become critical.
Designed explicitly as a lightweight extraction engine, it allows companies to automate repetitive formatting and data parsing at a fraction of the compute cost and latency, running directly on local hardware rather than relying on expensive, continuous cloud API calls. The AI industry in mid-2026 is seeing a renaissance in 'small' models, but the definition of 'small' varies wildly. By contrast, Liquid AI's LFM2.5-230M operates in a completely different weight class.
Source: VentureBeat