Using the magic of tesseract to extract text from PDFs which weren't machine written
pdf2image needs uv, poppler-utils and Tesseract-ocr installed;
curl -LsSf https://astral.sh/uv/install.sh | shsudo apt install poppler-utils tesseract-ocr
uv run scanner <folder with pdf docs>