How to Use Transformers.js in a Chrome Extension
A step-by-step guide on integrating Transformers.js into a Chrome extension, leveraging Gemma 4 E2B for enhanced web navigation.

We recently released a Transformers.js demo browser extension powered by Gemma 4 E2B to help users navigate the web. While building it, we encountered several practical challenges regarding Manifest V3 runtimes, model loading, and messaging that are worth sharing. This guide is aimed at developers who want to run local AI features in a Chrome extension with Transformers.js under Manifest V3 constraints.
By the end of this guide, you will have the same architecture used in this project: a background service worker that hosts models, a side panel chat UI, and a content script for page-level actions. In this guide, we will recreate the core architecture of Transformers.js Gemma 4 Browser Assistant, using the published extension as a reference and the open-source codebase as the implementation map. Before diving in, a quick scope note: I will not delve into the React UI layer or Vite build configuration.
The focus here is on the high-level architecture decisions: what runs in each Chrome runtime and how those pieces are orchestrated. If Manifest V3 is new to you, read this short overview first: What is Manifest V3?. In MV3, your architecture starts in public/manifest.json.
This project defines three entry points: The background service worker also handles chrome.action.onClicked to open the side panel for the active tab. A related entry point to know is that a popup can be defined with action.default_popup and works well for quick actions. This project uses a side panel for persistent chat, but the orchestration pattern is the same.
The key design decision is to keep heavy orchestration in the background and keep UI/page logic thin. One practical consequence of this division is that the conversation history also lives in the background (Agent.chatMessages): the UI sends events like AGENT_GENERATE_TEXT, background appends the message, runs inference, then emits MESSAGES_UPDATE back to the side panel. This split avoids duplicate model loads, keeps the UI responsive, and respects Chrome's security boundaries around DOM access.
Once runtimes are separated, messaging becomes the backbone. In this project, all messages are typed through enums in src/shared/types.ts. The orchestration rule is simple: the background is the single coordinator; side panel and content script are specialized workers that request actions and render results.
In src/shared/constants.ts, this extension uses two model roles: The split is intentional: Gemma 4 handles reasoning/tool decisions, while MiniLM generates vector embeddings for the semantic similarity search in ask_website and find_history. All inference runs in the background (src/background/background.ts): This gives a single model host for all tabs/sessions, avoids duplicate memory usage, and keeps the side panel UI responsive. Because models are loaded from the background service worker, artifacts are cached under the extension origin (chrome-extension://) rather than per-website origins, which gives one shared cache for the whole extension install.
Source: Hugging Face