Skip to content

kartik0905/ask-your-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 RAG File QA Chatbot

Chat with your PDFs using LangChain + Streamlit + OpenAI

Streamlit LangChain OpenAI ChromaDB PyMuPDF


📌 Overview

This project is a Retrieval-Augmented Generation (RAG) chatbot built with Streamlit + LangChain + OpenAI.
It allows you to upload PDF files, query them in natural language, and get AI-powered answers with sources.

  • 📄 Upload multiple PDFs
  • ✂️ Split text into smart chunks
  • 🧠 Generate embeddings with OpenAI
  • 💾 Store + retrieve chunks using ChromaDB
  • 🤖 Ask questions & get contextual answers with sources shown

🚀 Recent Improvements & Fixes

We’ve made several significant upgrades and fixes to evolve this chatbot from a simple prototype to a robust, production-ready conversational application.

🧠 1. Added Conversational Memory

  • Before: Each query was treated independently. Follow-ups like “why?” made no sense.
  • After: Implemented a history-aware RAG chain that reformulates follow-ups into full contextual questions (e.g., “why is the sky blue?”).
    ➤ The chatbot now maintains multi-turn memory, enabling natural and contextually aware conversations.

⚡ 2. Modernized Streaming Engine for Streamlit

  • Before: Used outdated callbacks (StreamHandler) that caused NoSessionContext errors.
  • After: Replaced with LangChain’s modern .stream() method.
    ➤ This allows real-time response streaming directly in Streamlit without threading issues.

🧰 3. Fixed Code-Level Bugs

  • 🔄 Circular Import Fixed: Renamed local langchain.pyapp.py to avoid module conflicts.
  • 🔕 Disabled ChromaDB telemetry logs for cleaner terminal output.

In summary, your chatbot is now stable, conversational, and fully interactive, ready for production use.


✨ Features

  • 📂 Multi-PDF Support — Upload and query multiple documents
  • 🧩 Chunking & Embedding — Splits content for better context retrieval
  • 🔍 RAG Pipeline — Retrieval + Context-aware AI answers
  • 🧠 Conversational Memory — Handles follow-up questions seamlessly
  • 💬 Real-time Streaming — Smooth token-by-token response in Streamlit
  • 📊 Source Transparency — Displays top 3 document sources
  • Streamlit UI — Simple and interactive interface

🛠️ Tech Stack

Layer Technologies Purpose
Frontend Streamlit Interactive UI
Backend Python, LangChain RAG pipeline & orchestration
Vector DB ChromaDB Store & retrieve embeddings
Document Loader PyMuPDF Parse PDF files
LLM + Embeddings OpenAI (GPT + embeddings) Contextual QA

⚙️ How It Works (RAG Pipeline)

  1. 📥 Upload PDFs — User uploads documents via Streamlit UI
  2. ✂️ Text Splitting — Documents are chunked into smaller passages
  3. 🔑 Embedding — Each chunk is embedded using OpenAI embeddings
  4. 💾 Vector Store — Chunks + embeddings stored in ChromaDB
  5. Query — User asks a question
  6. 🔍 Retriever — Relevant chunks are retrieved
  7. 🤖 LLM Response — GPT answers using retrieved context
  8. 📑 Sources — Top 3 supporting chunks shown

🧪 Local Development

🔧 Requirements

  • Python 3.9+
  • OpenAI API Key

🏁 Getting Started

1. Clone & Setup

git clone https://github.com/your-username/rag-file-chatbot.git
cd rag-file-chatbot

2. Install Dependencies

pip install -r requirements.txt

3. Add API Key

Create a .env file (see .env.example):

OPENAI_API_KEY=your_openai_key

🚦 Run the App

streamlit run app.py

The app will run locally at 👉 http://localhost:8501


📁 Folder Structure

rag-file-chatbot/
├── app.py              # Main Streamlit app
├── requirements.txt    # Python dependencies
├── .env.example        # Example API keys
├── .gitignore
└── README.md

🙌 Acknowledgments


Built with ❤️ by Kartik Garg

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages