PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Evaluate PP-OCRv6 online, then integrate lightweight, production-ready OCR with PaddlePaddle, Transformers, or ONNX Runtime backend.

Evaluate PP-OCRv6 online, then integrate lightweight, production-ready OCR with PaddlePaddle, Transformers, or ONNX Runtime backend.
PP-OCRv6 is the latest generation of PaddleOCR’s universal OCR model family. It is designed for real-world text detection and recognition across documents, screenshots, multilingual images, digital displays, industrial labels, and scene text.
The model family scales from 1.5M to 34.5M parameters , with three tiers: tiny , small , and medium . The medium and small tiers support 50 languages , including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. Try PP-OCRv6 online quickly: PP-OCRv6 Online Demo .
On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy . Compared with PP-OCRv5_server, it improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points .
PP-OCRv6 focuses on a practical OCR need: producing accurate, structured text outputs with small models and flexible deployment options. For a deeper discussion of why specialized OCR models remain useful in the VLM era, see our previous blog: PP-OCRv5 on Hugging Face: A Specialized Approach to OCR .
PP-OCRv6 introduces architecture, training, and data improvements across detection and recognition. The main design goal is to improve OCR accuracy while keeping model sizes suitable for different deployment settings.
PP-OCRv6 provides three model tiers, covering different model sizes and OCR accuracy levels.
PP-OCRv6 uses PPLCNetV4 as a unified backbone for text detection and text recognition.
For developers, the main benefit is consistency across the model family. The tiny, small, and medium tiers are not unrelated models; they are part of the same OCR family and share a common architectural direction.
Text detection is the first stage of the OCR pipeline. Detection quality affects the crops sent to the recognizer, and poor crops often lead to poorer recognition.
PP-OCRv6 upgrades the detection module with RepLKFPN , a lightweight large-kernel feature pyramid network designed for multi-scale text detection while keeping inference efficient.
This is relevant for real-world OCR inputs, where text may be small, dense, rotated, low-resolution, or embedded in complex backgrounds.
For text recognition, PP-OCRv6 uses EncoderWithLightSVTR . It combines local context modeling with global attention to improve recognition quality on challenging text crops.
Source: Hugging Face