📖 PDF Question-Answering App (RAG-based)

A PDF Question-Answering App built with RAG (Retrieval-Augmented Generation), allowing users to upload PDFs and ask context-based questions.
Powered by Streamlit, LangChain, Ollama, and Chroma/FAISS for efficient and accurate answers.

📌 Project Overview

This project is a Retrieval-Augmented Generation (RAG) based Question-Answering System where users can upload a PDF document and ask questions related to its content.

It extracts the text, splits it into chunks, stores it in a vector database, and uses a Large Language Model (LLM) to provide accurate, context-aware answers.
The RAG pipeline ensures that answers are based on real document context to minimize hallucination.

✨ Key Features

📂 Upload PDF and extract text automatically.
🔍 Context-aware Q/A using RAG (retrieves relevant chunks before answering).
⚡ Efficient Vector Search with Chroma/FAISS backend.
🤖 LLM-powered answers using Ollama + Llama 3.1.
🖥️ Interactive GUI built with Streamlit for seamless use.
🔒 Handles temporary files securely during processing.

🛠️ Tools & Technologies

– Interactive web interface
– RAG pipeline
– Local LLM (llama3.1, nomic-embed-text)
– Vector similarity search
– Extract text from PDFs

📋 Prerequisites

Before running the project, make sure you have:

🐍 Python 3.9+ installed
📦 Ollama installed and running locally → Install Ollama

📥 Pull required models in Ollama:

ollama pull llama3.1
ollama pull nomic-embed-text

📂 Project Structure

The repository is organized as follows:

pdf-qa-rag/
│── app.py              
└── requirements.txt

🚀 Quick Start

Follow these steps to set up and run the project locally:

1️⃣ Clone the Repository

git clone https://github.com/muqadasejaz/pdf-qa-rag-system.git

2️⃣ Create a Virtual Environment (Recommended)

# For Windows
python -m venv venv
venv\Scripts\activate

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Install Ollama & Pull Required Models

Download and install Ollama
Pull models locally:

ollama pull llama3.1
ollama pull nomic-embed-text

5️⃣ Run the Application

streamlit run app.py

6️⃣ Open in Browser

Streamlit will provide a local URL (e.g. http://localhost:8501).
Open it in your browser to access the PDF Question-Answering App.

📱 How to Use

Start the app by running:
```
streamlit run app.py
```
Open the Streamlit link in your browser (default: http://localhost:8501).
📂 Upload a PDF

Use the file uploader to select any PDF (e.g., notes, research papers, or reports).

⚡ Processing

The system will extract text, split it into chunks, and store them in the vector database.

📝 Ask Your Question

Enter your query in the text box (e.g., "What is the main conclusion of this paper?").

💡 Get the Answer

The LLM (Ollama + LangChain) retrieves the most relevant context and generates an accurate, context-based answer.

🏗️ Architecture

The system follows a RAG (Retrieval-Augmented Generation) pipeline for answering questions from PDFs.

Workflow:

📂 PDF Documents → Uploaded by the user.
🔎 LangChain Pipeline → Handles:
- PDF Text Extraction
- Text Chunking
- Vector Store (Chroma/FAISS)
🤖 Ollama LLM → Uses llama3.1 for generating answers.
💡 Generated Response → Delivered back to the user.

🖥️ GUI Preview

The application comes with a simple and interactive Streamlit-based UI.

📌 Home Screen

Upload your PDF, enter your question, and get instant answers.

📌 PDF Upload & Processing

After uploading, the system extracts and processes the PDF text for querying.

📌 Q&A Result

Ask questions, and the system provides precise answers from the document.

Answer Query 1:

Answer Query 2:

Output:

Wrong Query:

🙏 Acknowledgements

This project is built with the help of:

LangChain – for chaining components in the RAG pipeline
Ollama – for running LLaMA 3.1 locally
Streamlit – for the user-friendly interface
ChromaDB / FAISS – as vector databases for semantic search
nomic-embed-text – for text embeddings

Special thanks to the open-source community for providing tools that make projects like this possible 🚀

👤 Author

Muqadas Ejaz

BS Computer Science (AI Specialization)

AI/ML Engineer

Data Science & Gen AI Enthusiast

📫 Connect with me on LinkedIn

🌐 GitHub: github.com/muqadasejaz

📎 License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 PDF Question-Answering App (RAG-based)

📌 Project Overview

✨ Key Features

🛠️ Tools & Technologies

📋 Prerequisites

📂 Project Structure

🚀 Quick Start

1️⃣ Clone the Repository

📱 How to Use

🏗️ Architecture

Workflow:

🖥️ GUI Preview

📌 Home Screen

📌 PDF Upload & Processing

📌 Q&A Result

🙏 Acknowledgements

👤 Author

📎 License

About

Uh oh!

Releases

Packages

Languages

License

muqadasejaz/PDF-QA-RAG-System

Folders and files

Latest commit

History

Repository files navigation

📖 PDF Question-Answering App (RAG-based)

📌 Project Overview

✨ Key Features

🛠️ Tools & Technologies

📋 Prerequisites

📂 Project Structure

🚀 Quick Start

1️⃣ Clone the Repository

📱 How to Use

🏗️ Architecture

Workflow:

🖥️ GUI Preview

📌 Home Screen

📌 PDF Upload & Processing

📌 Q&A Result

🙏 Acknowledgements

👤 Author

📎 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages