Skip to content

A PDF Question-Answering App built with RAG (Retrieval-Augmented Generation), allowing users to upload PDFs and ask context-based questions. Powered by Streamlit, LangChain, Ollama, and Chroma for efficient and accurate answers.

License

Notifications You must be signed in to change notification settings

muqadasejaz/PDF-QA-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“– PDF Question-Answering App (RAG-based)

A PDF Question-Answering App built with RAG (Retrieval-Augmented Generation), allowing users to upload PDFs and ask context-based questions.
Powered by Streamlit, LangChain, Ollama, and Chroma/FAISS for efficient and accurate answers.

Streamlit
LangChain
Ollama
FAISS
License: MIT


πŸ“Œ Project Overview

This project is a Retrieval-Augmented Generation (RAG) based Question-Answering System where users can upload a PDF document and ask questions related to its content.

It extracts the text, splits it into chunks, stores it in a vector database, and uses a Large Language Model (LLM) to provide accurate, context-aware answers.
The RAG pipeline ensures that answers are based on real document context to minimize hallucination.


✨ Key Features

  • πŸ“‚ Upload PDF and extract text automatically.
  • πŸ” Context-aware Q/A using RAG (retrieves relevant chunks before answering).
  • ⚑ Efficient Vector Search with Chroma/FAISS backend.
  • πŸ€– LLM-powered answers using Ollama + Llama 3.1.
  • πŸ–₯️ Interactive GUI built with Streamlit for seamless use.
  • πŸ”’ Handles temporary files securely during processing.

πŸ› οΈ Tools & Technologies

  • Streamlit – Interactive web interface
  • LangChain – RAG pipeline
  • Ollama – Local LLM (llama3.1, nomic-embed-text)
  • FAISS – Vector similarity search
  • PyPDF – Extract text from PDFs

πŸ“‹ Prerequisites

Before running the project, make sure you have:

  • 🐍 Python 3.9+ installed
  • πŸ“¦ Ollama installed and running locally β†’ Install Ollama
  • πŸ“₯ Pull required models in Ollama:
    ollama pull llama3.1
    ollama pull nomic-embed-text

πŸ“‚ Project Structure

The repository is organized as follows:

pdf-qa-rag/
│── app.py              
└── requirements.txt    

πŸš€ Quick Start

Follow these steps to set up and run the project locally:

1️⃣ Clone the Repository

git clone https://github.com/muqadasejaz/pdf-qa-rag-system.git

2️⃣ Create a Virtual Environment (Recommended)

# For Windows
python -m venv venv
venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Install Ollama & Pull Required Models

  • Download and install Ollama

  • Pull models locally:

ollama pull llama3.1
ollama pull nomic-embed-text

5️⃣ Run the Application

streamlit run app.py

6️⃣ Open in Browser

  • Streamlit will provide a local URL (e.g. http://localhost:8501).
  • Open it in your browser to access the PDF Question-Answering App.

πŸ“± How to Use

  1. Start the app by running:

    streamlit run app.py
  2. Open the Streamlit link in your browser (default: http://localhost:8501).

  3. πŸ“‚ Upload a PDF

  • Use the file uploader to select any PDF (e.g., notes, research papers, or reports).
  1. ⚑ Processing
  • The system will extract text, split it into chunks, and store them in the vector database.
  1. πŸ“ Ask Your Question
  • Enter your query in the text box (e.g., "What is the main conclusion of this paper?").
  1. πŸ’‘ Get the Answer
  • The LLM (Ollama + LangChain) retrieves the most relevant context and generates an accurate, context-based answer.

πŸ—οΈ Architecture

The system follows a RAG (Retrieval-Augmented Generation) pipeline for answering questions from PDFs.

architecture

Workflow:

  1. πŸ“‚ PDF Documents β†’ Uploaded by the user.
  2. πŸ”Ž LangChain Pipeline β†’ Handles:
    • PDF Text Extraction
    • Text Chunking
    • Vector Store (Chroma/FAISS)
  3. πŸ€– Ollama LLM β†’ Uses llama3.1 for generating answers.
  4. πŸ’‘ Generated Response β†’ Delivered back to the user.

πŸ–₯️ GUI Preview

The application comes with a simple and interactive Streamlit-based UI.

πŸ“Œ Home Screen

Upload your PDF, enter your question, and get instant answers.

home

πŸ“Œ PDF Upload & Processing

After uploading, the system extracts and processes the PDF text for querying.

upload

πŸ“Œ Q&A Result

Ask questions, and the system provides precise answers from the document.

  • Answer Query 1:
output1
  • Answer Query 2:
query2
  • Output:
query2_output
  • Wrong Query:

    wrong_query

πŸ™ Acknowledgements

This project is built with the help of:

Special thanks to the open-source community for providing tools that make projects like this possible πŸš€


πŸ‘€ Author

Muqadas Ejaz

BS Computer Science (AI Specialization)

AI/ML Engineer

Data Science & Gen AI Enthusiast

πŸ“« Connect with me on LinkedIn

🌐 GitHub: github.com/muqadasejaz


πŸ“Ž License

This project is open-source and available under the MIT License.

About

A PDF Question-Answering App built with RAG (Retrieval-Augmented Generation), allowing users to upload PDFs and ask context-based questions. Powered by Streamlit, LangChain, Ollama, and Chroma for efficient and accurate answers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages