A Streamlit-based application that uses RAG (Retrieval-Augmented Generation) to answer questions about your PDF documents. The application processes your documents, creates embeddings, and uses them to provide accurate answers with source citations.
- Upload and process PDF documents
- Interactive chat interface
- Source citations with viewable context
- Hybrid search for better document retrieval
- Error handling and progress tracking
- Python 3.8 or higher
- pip (Python package installer)
- Virtual environment (recommended)
-
Clone the repository:
git clone https://github.com/jmiano/pdf-chat.git cd pdf-chat -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Place your PDF documents in the
data/directory -
Build the document index:
python src/build_index.py
-
Run the Streamlit app:
streamlit run src/app.py
-
Open your browser and navigate to the URL shown in the terminal (usually http://localhost:8501)
pdf-chat/
├── data/ # Directory for PDF documents
├── src/ # Source code
│ ├── app.py # Main Streamlit application
│ └── build_index.py # Document indexing script
├── tests/ # Unit tests
├── .env # Environment variables (create this file)
├── .gitignore # Git ignore rules
└── requirements.txt # Python dependencies
Create a .env file in the root directory with your configuration:
# Example configuration
OPENAI_API_KEY=your_api_key_herepython -m pytest tests/ -vThe code follows PEP 8 guidelines and includes comprehensive docstrings.
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request