AI Research

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters

AI News Desk

MarkTechPost

May 27, 2026

2 min read

Researchers propose MEMO, a modular framework that trains a dedicated memory model to internalize new knowledge without modifying large language model (LLM) parameters.

MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM Parameters

Large language models become static after pretraining, and their knowledge does not update as the world changes. Retraining a full LLM is too expensive at modern scales, and fine-tuning risks degrading previously learned knowledge. Retrieval-augmented generation (RAG) struggles when answers require reasoning across many documents.

A team of researchers from the National University of Singapore, MIT CSAIL, A*STAR, and the Singapore-MIT Alliance for Research and Technology (SMART) proposes a new approach called MEMO (Memory as a Model). MEMO separates memory from reasoning, using a small, dedicated language model, called the MEMORY model, to internalize knowledge from a target corpus. The EXECUTIVE model, the main LLM, is frozen and queried only through its standard input-output interface.

The MEMORY model is trained via supervised fine-tuning (SFT) on a reflection QA dataset created using a five-step data synthesis pipeline guided by a GENERATOR model. At inference, the EXECUTIVE model queries the MEMORY model through a structured multi-turn protocol with three sequential stages: grounding, entity identification, and answer seeking and synthesis. In experiments, MEMO achieves state-of-the-art results on three benchmarks: BrowseComp-Plus, NarrativeQA, and MuSiQue.

With Gemini-3-Flash as the EXECUTIVE model, MEMO achieves 53.58% on NarrativeQA, outperforming HippoRAG2's 23.21%. MEMO also demonstrates robustness to retrieval noise and supports incremental knowledge updates through model merging. The research team tests model merging on NarrativeQA using TIES merging (ρ=0.3), which reduces compute by 33% at K=2 corpora and 5.5× at K=10 corpora compared to full retraining.

Switching the EXECUTIVE model from Qwen2.5-32B-Instruct to Gemini-3-Flash yields gains of +12.45%, +26.73%, and +11.90% across the three benchmarks without retraining the MEMORY model. The MEMO framework addresses the limitations of existing methods for integrating new knowledge into LLMs, providing a modular and efficient approach to updating knowledge without modifying LLM parameters.

Source: MarkTechPost