Chapter 2
Docker Production Setup
MinerU's dependency chain is deep: Python 3.10-3.12, ray, PyMuPDF, PaddleOCR, vLLM, and 2GB+ of model files. One wrong version and nothing works. Docker fixes this permanently.
Why Docker Matters for MinerU
MinerU has one of the deepest dependency chains of any Python PDF tool. A production deployment touches:
- Python 3.10-3.12 — Strict version window. 3.9 and 3.13 both fail on different dependencies.
- ray 2.x — Distributed computing framework. Version-sensitive with protobuf and grpcio.
- PyMuPDF + PaddleOCR + vLLM — Three heavy libraries with conflicting transitive dependencies.
- Model files — 2-4GB of layout detection + OCR + VLM models that must be pre-downloaded.
A Docker image pins all of this. No "works on my machine." No CUDA version mismatch. No model re-download on every deploy.
The Dependency Chain, Visualized
mineru (magic-pdf)
|-- ray[default] >= 2.0
| |-- protobuf >= 3.15
| |-- grpcio >= 1.32
|-- PyMuPDF >= 1.23
|-- PaddleOCR >= 2.7
| |-- paddlepaddle-gpu >= 2.5 (GPU path)
| |-- paddlepaddle (CPU path)
|-- vLLM >= 0.4 (GPU path, optional)
| |-- transformers >= 4.40
| |-- torch >= 2.0
|-- Models (downloaded at first run)
|-- layout detection (~300MB)
|-- OCR recognition (~900MB)
|-- VLM (~2GB, GPU path only)
Key Dockerfile Decisions
Before you write a single line, decide:
- Base image —
nvidia/cuda:12.1-runtime-ubuntu22.04for GPU,python:3.11-slim-bookwormfor CPU-only. - Model strategy — Bake models into the image (faster cold start, larger image) or volume-mount (smaller image, download on first run).
- Ray setup — Single-node (just ray.init() in the container) vs multi-node (ray head + worker containers).
Full chapter continues with:
Complete multi-stage production Dockerfile (GPU + CPU variants) · docker-compose.yml with MinerU + vLLM + Redis · Model download caching strategy (no re-downloading 2GB models per rebuild) · Health checks, resource limits, volume mounts · CI integration for automated image builds
Get the Full Guide — $39