Robotics

Reachy Mini goes fully local

AI News Desk

Hugging Face

May 27, 2026

5 min read

After building your Reachy Mini, you'll install the conversation app and start talking to it.

After building your Reachy Mini, you'll install the conversation app and start talking to it. Until now, you had to send your audio to a server. But not anymore. Today we'll walk you through running the whole stack locally.

This stack is powered by speech-to-speech , our cascaded VAD → STT → LLM → TTS pipeline that exposes a Realtime API-compatible /v1/realtime WebSocket. Once you launch the backend, point the robot at it from the UI.

Cascades are the most flexible option in the open-source landscape today, and with the right pieces they're also the fastest. We'll recommend the components we like best, but the whole point of a cascade is that you can swap them. New models drop every week.

This blog walks you through running conversations with Reachy Mini fully locally. No cloud, no API keys, no data leaving your machine. Here's a video showing this live:

To serve the LLM, we'll use Hugging Face's llama.cpp . If you need to install it, the simplest way is brew install llama.cpp or winget install llama.cpp , for more help, check the docs . First, we'll run:

And done! The first time it will download the model, subsequent launches are fast.

We'll begin by simply installing the library

Then, while we are serving the LLM in another terminal, we can simply run:

And you can start talking to the model through your terminal! The first time it will need to download Parakeet and Qwen3TTS, but subsequent launches are fast.

Here's a video showing the local conversation mode:

Now, after you've tried it in --mode local , you can run again the command without that option to serve speech-to-speech to the robot.

Once you have llama.cpp and speech-to-speech running, you can start the robot with the desktop app and launch the conversation app. In the UI from the conversation app, you need to choose the local mode by clicking on "edit connection" in the HF backend. Here's a video showing how to do it:

And you're done. You can start talking to your robot. Every stage of the pipeline is a trade-off: there are faster TTS models with lower quality, slower STT models with higher quality. We optimized for multilingual, you might want to optimize for a single language. The rest of the blog covers how to customize.

Hosted realtime backends are convenient, but running your own engine unlocks three things:

Source: Hugging Face