# Document Intake — Quickstart Prerequisite: you already created the project virtual environment `.venv` in the repository root. ## macOS system deps Install `poppler` (needed by `pdf2image`): ```bash brew install poppler ``` ## Activate the virtual environment ```bash source .venv/bin/activate ``` ## Install Python dependencies ```bash pip install -r requirements.txt ``` If you need a specific `paddlepaddle` flavor (CPU vs GPU) follow the official install guide before or instead of the line above. ## Quick verification Check that `paddleocr` and `pymongo` import successfully: ```bash python -c "import paddleocr; import pymongo; print('imports OK')" ``` ## Running a processor (prototype) We will add a prototype script `processor.py` that: - Converts pages from a PDF to images using `pdf2image`. - Runs OCR on one page with `paddleocr`. - Prints basic extraction results. To run the prototype (once added): ```bash python processor.py --input samples/example.pdf ``` ## Useful files - Development notes: [DEVELOPMENT.md](DEVELOPMENT.md) - Python dependencies: [requirements.txt](requirements.txt) --- If you want, I can now add a minimal `processor.py` prototype and a `samples/` folder with a placeholder PDF. Which should I do next?