This project is a Retrieval-Augmented Generation (RAG) chatbot built with Streamlit + LangChain + OpenAI.
It allows you to upload PDF files, query them in natural language, and get AI-powered answers with sources.
- 📄 Upload multiple PDFs
- ✂️ Split text into smart chunks
- 🧠 Generate embeddings with OpenAI
- 💾 Store + retrieve chunks using ChromaDB
- 🤖 Ask questions & get contextual answers with sources shown
We’ve made several significant upgrades and fixes to evolve this chatbot from a simple prototype to a robust, production-ready conversational application.
- Before: Each query was treated independently. Follow-ups like “why?” made no sense.
- After: Implemented a history-aware RAG chain that reformulates follow-ups into full contextual questions (e.g., “why is the sky blue?”).
➤ The chatbot now maintains multi-turn memory, enabling natural and contextually aware conversations.
- Before: Used outdated callbacks (
StreamHandler) that causedNoSessionContexterrors. - After: Replaced with LangChain’s modern
.stream()method.
➤ This allows real-time response streaming directly in Streamlit without threading issues.
- 🔄 Circular Import Fixed: Renamed local
langchain.py→app.pyto avoid module conflicts. - 🔕 Disabled ChromaDB telemetry logs for cleaner terminal output.
In summary, your chatbot is now stable, conversational, and fully interactive, ready for production use.
- 📂 Multi-PDF Support — Upload and query multiple documents
- 🧩 Chunking & Embedding — Splits content for better context retrieval
- 🔍 RAG Pipeline — Retrieval + Context-aware AI answers
- 🧠 Conversational Memory — Handles follow-up questions seamlessly
- 💬 Real-time Streaming — Smooth token-by-token response in Streamlit
- 📊 Source Transparency — Displays top 3 document sources
- ⚡ Streamlit UI — Simple and interactive interface
| Layer | Technologies | Purpose |
|---|---|---|
| Frontend | Streamlit | Interactive UI |
| Backend | Python, LangChain | RAG pipeline & orchestration |
| Vector DB | ChromaDB | Store & retrieve embeddings |
| Document Loader | PyMuPDF | Parse PDF files |
| LLM + Embeddings | OpenAI (GPT + embeddings) | Contextual QA |
- 📥 Upload PDFs — User uploads documents via Streamlit UI
- ✂️ Text Splitting — Documents are chunked into smaller passages
- 🔑 Embedding — Each chunk is embedded using OpenAI embeddings
- 💾 Vector Store — Chunks + embeddings stored in ChromaDB
- ❓ Query — User asks a question
- 🔍 Retriever — Relevant chunks are retrieved
- 🤖 LLM Response — GPT answers using retrieved context
- 📑 Sources — Top 3 supporting chunks shown
- Python 3.9+
- OpenAI API Key
git clone https://github.com/your-username/rag-file-chatbot.git
cd rag-file-chatbotpip install -r requirements.txtCreate a .env file (see .env.example):
OPENAI_API_KEY=your_openai_keystreamlit run app.pyThe app will run locally at 👉 http://localhost:8501
rag-file-chatbot/
├── app.py # Main Streamlit app
├── requirements.txt # Python dependencies
├── .env.example # Example API keys
├── .gitignore
└── README.md
Built with ❤️ by Kartik Garg