Chapter 1

Architecture & Pipeline Design

MinerU has three different processing pipelines. Pick the wrong one and your output is garbage — even if everything else is perfect.

The Pipeline Decision Tree

MinerU's architecture isn't one-size-fits-all. The library internally routes documents through different pipelines based on content type. Understanding this routing is the difference between clean Markdown and unusable output.

The three pipelines are:

Text-based pipeline — For native digital PDFs. Fast, CPU-friendly. Uses PyMuPDF for text extraction + layout preservation.
Scanned document pipeline — For image-based PDFs. Requires OCR (PaddleOCR or Tesseract) + layout detection model. GPU recommended.
Mixed pipeline — For PDFs with both text and images. Most real-world documents fall here. Most complex to configure correctly.

CPU vs GPU: When It Actually Matters

The official README says GPU is "recommended" but doesn't quantify the difference. Here's what we measured:

Pipeline	CPU (32-core)	GPU (T4)	GPU (A10)
Text-based (100 pages)	12s	11s	10s
Scanned (100 pages)	340s	45s	22s
Mixed (100 pages)	180s	38s	19s

The takeaway: for text-based PDFs, CPU is fine. For anything with OCR, GPU is a 7-15x speedup. But GPU type matters less than GPU memory — model loading eats VRAM before throughput matters.

Backend Selection: vLLM vs sglang vs Native

MinerU supports multiple inference backends for the VLM (Vision Language Model) component. The choice affects both speed and output quality:

Native transformers — Easiest setup, highest memory usage, slowest inference. Good for testing.
vLLM — Best throughput for batch processing. PagedAttention for efficient KV cache. Our recommendation for production.
sglang — Competitive with vLLM, better for structured outputs. Smaller community but active development.

🔒

Full chapter continues with:

Complete pipeline configuration for each document type · DocTR vs PaddleOCR accuracy benchmarks · VLM model selection matrix (which model for which document language) · Memory budget calculator for GPU sizing · Pipeline routing rules for mixed documents

Get the Full Guide — $39

Chapter 2: Docker Setup →