Document Intake — Quickstart

Prerequisite: you already created the project virtual environment .venv in the repository root.

macOS system deps

Install poppler (needed by pdf2image):

brew install poppler

source .venv/bin/activate

pip install -r requirements.txt

If you need a specific paddlepaddle flavor (CPU vs GPU) follow the official install guide before or instead of the line above.

Check that paddleocr and pymongo import successfully:

python -c "import paddleocr; import pymongo; print('imports OK')"

We will add a prototype script processor.py that:

To run the prototype (once added):

python processor.py --input samples/example.pdf

If you want, I can now add a minimal processor.py prototype and a samples/ folder with a placeholder PDF. Which should I do next?