MIT's MeMo lets teams swap in a better LLM without retraining — and performance jumps 26%
Researchers at MIT and other universities have developed MeMo, a framework that enables large language models to acquire new knowledge without retraining, improving performance by up to 26%.

['Enabling large language models (LLMs) to acquire new knowledge after training remains a major hurdle for enterprise AI. Current solutions are either too expensive, too slow, or constrained by context window limits. Researchers at MIT and other universities have developed MeMo, a framework that addresses this challenge by encoding new knowledge into a dedicated smaller memory model that operates separately from the main LLM.', 'The MeMo framework works with both open- and closed-source models and sidesteps the complexity of RAG pipelines and full model retraining.
Experiments show that MeMo handles complex queries reliably even when retrieval pipelines are noisy. It avoids the catastrophic forgetting associated with direct fine-tuning and provides a cost-effective pathway for continuous knowledge updates. According to Armando Solar-Lezama, a co-author of the paper, "Vector databases have a fundamentally difficult job of encoding the full semantics of a chunk of text in a single vector, and then match that vector to a query, even when the relevance of the chunk...
may only be apparent in the context of other chunks."', 'The MeMo framework introduces a modular architecture featuring two separate components: the MEMORY model, a small language model trained specifically to encode new knowledge into its parameters, and the EXECUTIVE model, a frozen, off-the-shelf LLM that functions as the reasoning engine. The MEMORY model is fine-tuned on a dataset of targeted question-answer pairs, called "reflections," which capture every possible angle of a knowledge corpus. At inference time, the interaction between the two models follows a structured, three-stage protocol.', 'The experiments revealed that MeMo dominated in long-document reasoning, achieving 53.58% accuracy on the NarrativeQA benchmark paired with Google\'s proprietary Gemini 3 Flash.
Upgrading the reasoning engine requires zero retraining, and simply switching the EXECUTIVE model from the open-source Qwen to the proprietary Gemini 3 Flash boosted MeMo\'s performance by 26.73% on NarrativeQA and 11.90% on the MuSiQue benchmark. According to Daniela Rus, co-author of the paper and director of the MIT Computer Science and Artificial Intelligence Lab (CSAIL), "in the same way that caching and indexing are standard components of any serious data system today, I would expect memory models to become a standard architectural component alongside retrieval."', "However, there are several key limitations to consider when deploying MeMo. Unlike traditional RAG systems that quickly index raw documents into a vector database, MeMo requires an upfront training cost for each new corpus.
Source: VentureBeat