A state-of-the-art document question answering system that extracts knowledge from your PDFs using AI.
| Feature | Description |
|---|---|
| 📄 Multi-PDF Processing | Upload and analyze up to 3 PDFs simultaneously (300 pages max each) |
| 💬 Natural Language Interface | Ask questions in plain English about your documents |
| 🧠 Smart Context Understanding | Gemini AI provides accurate answers based on document content |
| ⚡ Fast Retrieval | Chroma vector database enables quick information lookup |
graph TD
A[PDF Upload] --> B[Text Extraction]
B --> C[Chunking]
C --> D[Vector Embeddings]
D --> E[Chroma DB Storage]
E --> F[User Query]
F --> G[Relevant Chunk Retrieval]
G --> H[Gemini Answer Generation]
H --> I[Response Display]
| Category | Libraries |
|---|---|
| Framework | langchain, langchain_community |
| AI Models | langchain_google_genai (Gemini) |
| Vector DB | langchain_chroma |
| PDF Processing | pypdf, pdfminer.six, unstructured |
| Utilities | python-dotenv, nest_asyncio, sentence-transformers |
| UI | streamlit |
- Python 3.8+
- Google API key with Gemini access
-
Clone the repository:
git clone https://github.com/yourusername/neuroquery.git cd neuroquery -
Create and activate virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Create
.envfile:GOOGLE_API_KEY=your_api_key_here
streamlit run app.py| Platform | Instructions |
|---|---|
| Streamlit Cloud | Deploy Guide |
| Hugging Face | Spaces Guide |
| AWS/Azure | Use Docker with Streamlit server |
- Upload PDF documents (max 3 files)
- Wait for processing to complete
- Ask questions about the document content
- View AI-generated answers with source references
- Processing Errors: Ensure PDFs contain selectable text (not scanned images)
- API Errors: Verify your Google API key has Gemini access
- Performance: For large documents, increase chunk size in
config.py
Developed with ❤️ by Jasjeev Singh Kohli

