Chapter 1
Architecture & Pipeline Design
MinerU has three different processing pipelines. Pick the wrong one and your output is garbage — even if everything else is perfect.
The Pipeline Decision Tree
MinerU's architecture isn't one-size-fits-all. The library internally routes documents through different pipelines based on content type. Understanding this routing is the difference between clean Markdown and unusable output.
The three pipelines are:
- Text-based pipeline — For native digital PDFs. Fast, CPU-friendly. Uses PyMuPDF for text extraction + layout preservation.
- Scanned document pipeline — For image-based PDFs. Requires OCR (PaddleOCR or Tesseract) + layout detection model. GPU recommended.
- Mixed pipeline — For PDFs with both text and images. Most real-world documents fall here. Most complex to configure correctly.
CPU vs GPU: When It Actually Matters
The official README says GPU is "recommended" but doesn't quantify the difference. Here's what we measured:
| Pipeline | CPU (32-core) | GPU (T4) | GPU (A10) |
|---|---|---|---|
| Text-based (100 pages) | 12s | 11s | 10s |
| Scanned (100 pages) | 340s | 45s | 22s |
| Mixed (100 pages) | 180s | 38s | 19s |
The takeaway: for text-based PDFs, CPU is fine. For anything with OCR, GPU is a 7-15x speedup. But GPU type matters less than GPU memory — model loading eats VRAM before throughput matters.
Backend Selection: vLLM vs sglang vs Native
MinerU supports multiple inference backends for the VLM (Vision Language Model) component. The choice affects both speed and output quality:
- Native transformers — Easiest setup, highest memory usage, slowest inference. Good for testing.
- vLLM — Best throughput for batch processing. PagedAttention for efficient KV cache. Our recommendation for production.
- sglang — Competitive with vLLM, better for structured outputs. Smaller community but active development.
Full chapter continues with:
Complete pipeline configuration for each document type · DocTR vs PaddleOCR accuracy benchmarks · VLM model selection matrix (which model for which document language) · Memory budget calculator for GPU sizing · Pipeline routing rules for mixed documents
Get the Full Guide — $39