Mistral Launches OCR 4, Pushing Document Extraction Beyond Text
Mistral AI releases OCR 4, a document intelligence model that extracts structured representations of documents, complete with bounding boxes and confidence scores.

Mistral AI on Tuesday released OCR 4, a document intelligence model that moves beyond raw text extraction to return structured representations of entire documents — complete with bounding boxes, block-type classification, and per-word confidence scores. The release marks Mistral's fourth generation of optical character recognition technology in roughly 15 months and lands at a moment when the company's pitch for European AI sovereignty has never been more commercially relevant. The model supports 170 languages across 10 language groups, accepts PDF, DOC, PPT, and OpenDocument formats, and can be deployed as a single container on an organization's own infrastructure — a capability Mistral is positioning directly at enterprises in regulated industries that cannot route sensitive documents through U.S.-jurisdiction cloud APIs.
"Mistral OCR 4 extracts and structures content from a wide range of documents," the company said in its announcement. "Where previous generations focused on converting a page into clean text and tables, OCR 4 returns a structured representation of the document." The model is available immediately through the Mistral API, Document AI in Mistral Studio, Amazon SageMaker, and Microsoft Foundry, with Snowflake Parse Document support coming soon. Pricing starts at $4 per 1,000 pages, dropping to $2 per 1,000 pages through a batch API discount.
OCR 4 treats every document as a semantic map, not a wall of text. The central engineering shift in OCR 4 is structural. Rather than outputting a flat stream of extracted text — the paradigm that has defined OCR for decades — the model returns a layered representation in which every block is localized with a bounding box, classified by type (title, table, equation, signature, and others), and scored for confidence at both the page and word level.
Mistral says bounding boxes were its most-requested capability. The reason is straightforward: without location data, downstream systems cannot trace an extracted fact back to its source on a specific page. That traceability gap has been a persistent friction point for enterprises building retrieval-augmented generation (RAG) pipelines, compliance workflows, or any application where "where did this number come from?" is a question that needs an auditable answer.
Block classification addresses a related problem. A paragraph tagged as a "title" can segment a document into hierarchical chunks for semantic search. A block tagged as a "table" can be routed to a structured-data pipeline rather than a text summarizer.
A block tagged as a "signature" can trigger a redaction workflow in a compliance system. These are not novel ideas in isolation, but packaging them as first-class outputs of the OCR model itself — rather than requiring a separate layout-analysis stage — removes an integration layer that enterprise teams have historically had to build and maintain themselves. The confidence scores serve a dual purpose.
Source: VentureBeat