Initial commit

2026-01-01 21:57:33 -08:00
commit d246d2a0d7
6 changed files with 285 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,56 @@
+# Document Intake — Quickstart
+
+Prerequisite: you already created the project virtual environment `.venv` in the repository root.
+
+## macOS system deps
+
+Install `poppler` (needed by `pdf2image`):
+
+```bash
+brew install poppler
+```
+
+## Activate the virtual environment
+
+```bash
+source .venv/bin/activate
+```
+
+## Install Python dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+If you need a specific `paddlepaddle` flavor (CPU vs GPU) follow the official install guide before or instead of the line above.
+
+## Quick verification
+
+Check that `paddleocr` and `pymongo` import successfully:
+
+```bash
+python -c "import paddleocr; import pymongo; print('imports OK')"
+```
+
+## Running a processor (prototype)
+
+We will add a prototype script `processor.py` that:
+
+- Converts pages from a PDF to images using `pdf2image`.
+- Runs OCR on one page with `paddleocr`.
+- Prints basic extraction results.
+
+To run the prototype (once added):
+
+```bash
+python processor.py --input samples/example.pdf
+```
+
+## Useful files
+
+- Development notes: [DEVELOPMENT.md](DEVELOPMENT.md)
+- Python dependencies: [requirements.txt](requirements.txt)
+
+---
+
+If you want, I can now add a minimal `processor.py` prototype and a `samples/` folder with a placeholder PDF. Which should I do next?