Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines
Today, Mistral AI released OCR 4 , its latest document-understanding model.

Today, Mistral AI released OCR 4 , its latest document-understanding model. This new release adds bounding boxes, block classification, and inline confidence scores alongside extracted text. It supports 170 languages across 10 language groups and runs in a single container for fully self-hosted deployments. OCR 4 also serves as an ingestion component for enterprise search, RAG, and domain-specific retrieval pipelines.
Mistral OCR 4 extracts and structures content from a wide range of documents. Previous generations focused on converting a page into clean text and tables. OCR 4 instead returns a structured representation of the whole document.
Each block is localized with a bounding box and classified by type. Block types include titles, tables, equations, signatures, and more. Inline confidence scores are generated per-page and per-word.
Downstream systems therefore learn more than what a document says. They also learn where each element sits, what role it plays, and how confident the model is. That extra context matters for citations, redactions, and human-in-the-loop verification.
OCR 4 accepts common enterprise formats, including PDF, DOC, PPT, and OpenDocument. The model is compact enough to deploy in a single container. Self-managed deployment is available to enterprise customers for data residency and compliance.
Mistral compared OCR 4 against AI-native OCR models, frontier general-purpose models, enterprise document services, and Mistral OCR 3.
A number of independent annotators preferred OCR 4 over every leading system tested. Win rates averaged 72% across the comparison set. The evaluation used 600+ documents across 12+ languages, sourced from third-party vendors. Annotators ranked each competitor’s output against OCR 4’s, document by document.
On automated benchmarks, OCR 4 scored 85.20 on the public OlmOCRBench. It scored 93.07 on OmniDocBench and .98 on Mistral’s internal Crawl Multilingual evaluation.
Two customer data points add context. Rogo reported equivalent accuracy at roughly 8x lower cost and 17x lower latency versus leading agentic parsers. Anaqua measured roughly 4x faster per page than its incumbent provider.
Bounding boxes were Mistral’s most-requested capability. They localize text for in-context highlighting and reliable data pipelines.
Block types and confidence scores serve different jobs. They drive source-grounded citations, redactions, and human-in-the-loop verification. This structure supports several downstream workloads.
Source: MarkTechPost