Deepseek-OCR: Contexts Optical Compression
DeepSeek OCR is a next-generation optical character recognition (OCR) solution built by DeepSeek, now available via their open-source model hub and API. It supports complex visual-text inputs—including scanned documents, photos, forms and mixed-layout pages—and unifies text extraction, layout understanding, and visual-context comprehension into one seamless model. DeepSeek OCR can convert high-resolution imagery at industrial scale (e.g., hundreds of thousands of pages per day on a single A100-class GPU). Try DeepSeek OCR for free below!
Try DeepSeek OCR Live Demo
Experience the power of DeepSeek OCR in real-time. Upload your images and see instant text extraction with high accuracy.
Loading DeepSeek OCR...

What is DeepSeek OCR
DeepSeek OCR is an advanced optical character recognition system that leverages cutting-edge AI technology to accurately extract text from images and documents. Built with sophisticated neural networks and multi-language support, it provides powerful text detection and recognition capabilities for complex scenarios, offering both intuitive web interface and robust API integration for efficient and flexible text processing workflows.
- Multi-language Text RecognitionAccurately extract text from images in over 80 languages with advanced neural network technology and language-aware processing capabilities.
- Complex Scene HandlingProcess challenging document layouts with curved text, multiple orientations, and complex backgrounds using sophisticated detection algorithms.
- High Accuracy RecognitionAchieve industry-leading text extraction accuracy with optimized optical character recognition and advanced post-processing techniques.
Key Features of DeepSeek OCR
Advanced AI-powered text recognition capabilities designed for professionals and developers worldwide.
Multi-Language Support
Recognize text from over 80 languages including Chinese, English, Arabic, and more with language-aware character recognition.
Robust Text Detection
Detect text regions in complex layouts with curved text, multiple orientations, and challenging background conditions.
High-Speed Processing
Process images rapidly with optimized inference pipeline and GPU acceleration for real-time text extraction results.
Unified Framework
Utilize an integrated text detection and recognition system that provides end-to-end text extraction from images.
Structured Layout Recovery
Preserve document structure including paragraphs, columns, and tables while extracting text with proper formatting.
API Integration
Integrate powerful OCR capabilities into your applications with RESTful API and SDK support for multiple programming languages.
What People Are Talking About DeepSeek-OCR on X
If you enjoy using DeepSeek OCR, please share your experience on Twitter with the hashtag
Massively unexpected update from DeepSeek: a powerful, high-compression MoE OCR model.
— Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex) October 20, 2025
> In production, DeepSeek-OCR can generate 33 million pages of data per day for LLMs/VLMs using 20 nodes (x8 A100-40G).
They want ALL the tokens. You're welcome to have some too. https://t.co/ks97gjFuhd pic.twitter.com/mXV08ifRle
DeepSeek-OCR has some weird architectural choices for the LLM decoder: DeepSeek3B-MoE-A570M
— elie (@eliebakouch) October 20, 2025
-> uses MHA, no MLA (not even GQA?)
-> 2 shared experts (like DeepSeek V2, but V3 only has 1)
-> quite low sparsity, activation ratio is 12.5%. For V3 it’s 3.52%, for V2 it’s 5%
-> not… pic.twitter.com/nOYptOn3OE
Letsss gooo! DeepSeek just released a 3B OCR model on Hugging Face 🔥
— Vaibhav (VB) Srivastav (@reach_vb) October 20, 2025
Optimised to be token efficient AND scale ~200K+ pages/day on A100-40G
Same arch as DeepSeek VL2
Use it with Transformers, vLLM and more 🤗https://t.co/n4kHihS3At
NEW DeepSeek OCR model that outperforms dots ocr while prefilling 3x less tokens pic.twitter.com/g9T93PndFb
— Casper Hansen (@casper_hansen_) October 20, 2025
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support.
— vLLM (@vllm_project) October 20, 2025
🧠 Compresses visual contexts up to 20× while keeping… pic.twitter.com/bx3d7LnfaR
🚨 DeepSeek just did something wild.
— God of Prompt (@godofprompt) October 20, 2025
They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.
Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That… pic.twitter.com/5ChoESanC8
is it just me or is this deepseek paper really…weird? like the flagship results are all about compression ratios and they’re gesturing at implications for LLM memory but… it’s an OCR model? are they suggesting that LLMs should ingest OCR embeddings of screenshots of old notes?? pic.twitter.com/ptxkgANIeW
— will brown (@willccbb) October 20, 2025
DeepSeek-OCR: https://t.co/Hww4tubUiS
— Ray Fernando (@RayFernando1337) October 20, 2025
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
— Andrej Karpathy (@karpathy) October 20, 2025
The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language… https://t.co/AxRXBdoO0F
Compress everything visually!
— 机器之心 JIQIZHIXIN (@jiqizhixin) October 20, 2025
DeepSeek has just released DeepSeek-OCR, a state-of-the-art OCR model with 3B parameters.
Core idea: explore long-context compression via 2D optical mapping.
Architecture:
- DeepEncoder → compresses high-res inputs into few vision tokens;
-… pic.twitter.com/qbRTi8ViLY
