Chapter 4
Performance Tuning
Your GPU sits at 40% utilization while a queue of 5,000 PDFs piles up. The defaults aren't optimized for your hardware. Here's how to fix it.
The Three Knobs That Actually Matter
MinerU performance tuning boils down to three parameters. Everything else is marginal:
- Batch size — How many pages MinerU processes in one forward pass. Bigger = more GPU throughput, but more VRAM. The sweet spot is usually 4-8 for scanned PDFs on a T4, 8-16 on an A10.
- Concurrent workers — How many Ray actors process PDFs in parallel. Each worker loads its own model copy. Too many and you OOM. Too few and GPU sits idle.
- VLM offload — Whether the VLM runs on the same GPU as OCR or a separate one. Splitting them can double throughput for mixed-pipeline documents.
GPU Memory Budget
Before tuning, understand where your VRAM goes. Here's the memory budget for a typical T4 (16GB):
| Component | VRAM Usage | Notes |
|---|---|---|
| Layout detection model | ~1.2 GB | DocTR or PaddleOCR layout |
| OCR recognition model | ~800 MB | PaddleOCR rec |
| VLM (optional) | ~4-8 GB | Depends on model size |
| CUDA context + overhead | ~500 MB | CUDA runtime + cuDNN |
| Batch processing workspace | ~2-4 GB | Scales with batch size |
| Remaining for 2nd worker | ~2-7 GB | Only if VLM is offloaded |
On a T4 with VLM enabled, you get one worker. Without VLM (text-only pipeline), you can run 2-3 workers on the same GPU.
Quick Wins Before Deep Tuning
- Disable OCR for text-based PDFs — MinerU sometimes runs OCR on pages that don't need it. Set
enable_ocr: autoinstead oftrue. - Pre-sort PDFs by page count — Batch similar-sized PDFs together. One 500-page PDF in a batch of 10-page PDFs creates a straggler that holds up the entire batch.
- Use FP16 for VLM — Half precision cuts VLM memory by ~40% with negligible accuracy loss for document understanding tasks.
Full chapter continues with:
Complete benchmarking scripts for your specific hardware · Tuning reference table (T4/A10/A100/L40S at batch sizes 1-32) · Concurrent worker scaling formula · GPU memory profiler script · Throughput optimization by document type · Cost-per-page optimization for cloud GPU instances
Get the Full Guide — $39