Free preview — Top 5 errors (of 20+). Get the guide for the complete error reference.

Chapter 5

Error Troubleshooting Bible

MinerU has 175 releases, 2,000+ GitHub issues, and error messages that assume you read Chinese. Here are the top 5 errors that kill production deployments — and exactly how to fix them.

Error #1: CUDA Out of Memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB
(GPU 0; 15.77 GiB total capacity; 14.92 GiB already allocated)

Why it happens: Multiple Ray workers each load a full copy of the VLM model. Two workers + VLM = 16GB+ on a 16GB card.

Fix: Set RAY_NUM_WORKERS=1 when VLM is enabled, or offload VLM to a separate GPU. Reduce batch size with batch_size=2. Enable FP16 VLM inference.

Error #2: Ray Actor Death

ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        Cause: The actor is dead because its worker process has died.

Why it happens: A single corrupted PDF causes a segfault in PyMuPDF or PaddlePaddle's C++ layer. The entire Ray actor process dies, taking your batch with it.

Fix: Wrap PDF processing in a subprocess with a timeout. Use Ray's max_restarts and max_task_retries. Isolate suspicious PDFs with a pre-flight check script.

Error #3: Model Download Corruption

OSError: [Errno 5] Input/output error: 'models/layout/det_db_mv3.pdparams'
RuntimeError: PaddleCheck failed: Cannot open file

Why it happens: MinerU downloads 2GB+ of models on first run. Network interruption, disk full, or Docker layer caching bugs corrupt the download silently.

Fix: Pre-download models in Dockerfile with checksum verification. Use a dedicated model cache volume. Run magic-pdf --verify-models before starting the queue.

Error #4: Version Conflict Hell

ERROR: pip's dependency resolver does not currently take into account all the packages.
magic-pdf 2.x requires protobuf<=3.20, but ray 2.9 requires protobuf>=3.23

Why it happens: MinerU pins protobuf<=3.20 for PaddlePaddle compatibility, but Ray 2.9+ needs protobuf>=3.23. These two are fundamentally incompatible without patching.

Fix: Use Ray 2.7 (last version compatible with protobuf 3.20), or apply the protobuf compatibility patch that monkey-patches Ray's gRPC imports. The guide includes the exact patch file.

Error #5: Empty Output / Silent Failure

# No error message. Output directory exists but is empty.
$ ls output/0447ae2_origin.md
# (empty file, 0 bytes)

Why it happens: MinerU's pipeline returns empty output when the layout detection model fails to find any text regions. Most common with: dark-background PDFs, scanned documents with poor contrast, and PDFs where text is embedded as curves/outlines.

Fix: Check the pipeline stage logs (not just the final output). Set --log-level DEBUG to see which stage failed. For curve-text PDFs, force OCR path with --force-ocr. Pre-process low-contrast documents with ImageMagick.

🔒

Full chapter continues with:

15 additional errors with diagnosis + fix · Complete error reference table (symptom → cause → fix → prevention) · Debugging checklist for production incidents · Memory profiling script for leak detection · Orphan process cleanup automation · log-level configuration per pipeline stage

Get the Full Guide — $39