Chapter 5
Error Troubleshooting Bible
MinerU has 175 releases, 2,000+ GitHub issues, and error messages that assume you read Chinese. Here are the top 5 errors that kill production deployments — and exactly how to fix them.
Error #1: CUDA Out of Memory
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB
(GPU 0; 15.77 GiB total capacity; 14.92 GiB already allocated)
Why it happens: Multiple Ray workers each load a full copy of the VLM model. Two workers + VLM = 16GB+ on a 16GB card.
Fix: Set RAY_NUM_WORKERS=1 when VLM is enabled, or offload VLM to a separate GPU. Reduce batch size with batch_size=2. Enable FP16 VLM inference.
Error #2: Ray Actor Death
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
Cause: The actor is dead because its worker process has died.
Why it happens: A single corrupted PDF causes a segfault in PyMuPDF or PaddlePaddle's C++ layer. The entire Ray actor process dies, taking your batch with it.
Fix: Wrap PDF processing in a subprocess with a timeout. Use Ray's max_restarts and max_task_retries. Isolate suspicious PDFs with a pre-flight check script.
Error #3: Model Download Corruption
OSError: [Errno 5] Input/output error: 'models/layout/det_db_mv3.pdparams'
RuntimeError: PaddleCheck failed: Cannot open file
Why it happens: MinerU downloads 2GB+ of models on first run. Network interruption, disk full, or Docker layer caching bugs corrupt the download silently.
Fix: Pre-download models in Dockerfile with checksum verification. Use a dedicated model cache volume. Run magic-pdf --verify-models before starting the queue.
Error #4: Version Conflict Hell
ERROR: pip's dependency resolver does not currently take into account all the packages.
magic-pdf 2.x requires protobuf<=3.20, but ray 2.9 requires protobuf>=3.23
Why it happens: MinerU pins protobuf<=3.20 for PaddlePaddle compatibility, but Ray 2.9+ needs protobuf>=3.23. These two are fundamentally incompatible without patching.
Fix: Use Ray 2.7 (last version compatible with protobuf 3.20), or apply the protobuf compatibility patch that monkey-patches Ray's gRPC imports. The guide includes the exact patch file.
Error #5: Empty Output / Silent Failure
# No error message. Output directory exists but is empty.
$ ls output/0447ae2_origin.md
# (empty file, 0 bytes)
Why it happens: MinerU's pipeline returns empty output when the layout detection model fails to find any text regions. Most common with: dark-background PDFs, scanned documents with poor contrast, and PDFs where text is embedded as curves/outlines.
Fix: Check the pipeline stage logs (not just the final output). Set --log-level DEBUG to see which stage failed. For curve-text PDFs, force OCR path with --force-ocr. Pre-process low-contrast documents with ImageMagick.
Full chapter continues with:
15 additional errors with diagnosis + fix · Complete error reference table (symptom → cause → fix → prevention) · Debugging checklist for production incidents · Memory profiling script for leak detection · Orphan process cleanup automation · log-level configuration per pipeline stage
Get the Full Guide — $39