A powerful chatbot that can answer questions about your documents - including PDFs, Word documents, text files, and images with text.
- Multiple Document Types: Process PDFs, Word documents (.docx), text files, and images (.jpg, .png, etc.)
- Built-in OCR: Extract text from images and scanned documents using EasyOCR
- Semantic Search: Find relevant information across all your documents
- Claude AI Integration: Get intelligent, human-like responses to your questions
- User-Friendly Interface: Easy-to-use web interface built with Streamlit
- Python 3.8+
- Claude API key (from Anthropic)
-
Clone this repository or download the files
-
Create a virtual environment and activate it:
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project directory with your Claude API key:
ANTHROPIC_API_KEY=your_api_key_here
Place your document files in a folder called documents
in the project directory. The system supports:
- PDF files (.pdf)
- Word documents (.docx, .doc)
- Text files (.txt)
- Image files with text (.jpg, .jpeg, .png, .bmp, .tiff, .tif)
Run the following command to process your documents and create the vector database:
python process_documents.py
You can specify custom folders if needed:
python process_documents.py --docs_folder custom_docs --db_folder custom_db
Then start the app:
streamlit run app.py
Go http://localhost:8501 if it doesn't open automatically.