RAG: Chat with any PDFs 📚

A Streamlit-based application that leverages Retrieval-Augmented Generation (RAG) to allow users to interact with PDF documents. Users can upload PDFs, ask questions about their contents, and receive AI-generated responses based on the uploaded documents.

Features

PDF Upload: Upload and view PDFs directly in the app sidebar.
Text Cleaning: Ensures text integrity by normalizing Unicode and removing invalid characters.
Text Splitting: Splits large PDF content into manageable chunks for efficient processing.
Vector Database: Creates and stores embeddings for document chunks using Google Generative AI Embeddings.
Question Answering: Uses Groq’s llama3-70b-8192 model to answer user queries based on the uploaded PDFs.

How It Works

PDF Upload:
- Users upload PDFs, which are rendered in the sidebar for easy viewing.
- The text content of the PDF is extracted and cleaned.
Text Processing:
- The extracted text is split into chunks using the RecursiveCharacterTextSplitter.
Embedding and Storage:
- Each text chunk is embedded using the GoogleGenerativeAIEmbeddings model.
- The embeddings are stored in a FAISS vector database.
Question Answering:
- Users submit queries through a chat interface.
- Relevant chunks are retrieved using similarity search.
- Answers are generated using Groq’s llama3-70b-8192 model.

Installation

Clone the Repository:

git clone https://github.com/Osama-Abo-Bakr/RAG-Chat-with-any-PDFs.git
cd RAG-Chat-with-any-PDFs

Install Dependencies:

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install required packages:
```
pip install -r requirements.txt
```

Set Up Environment Variables:
- Create a .env file in the project directory and add your credentials:
```
GOOGLE_API_KEY=your_google_api_key
GROQ_API_KEY=your_groq_api_key
```

Usage

Run the Application:
```
streamlit run main.py
```
Upload and Interact:
- Use the sidebar to upload your PDF.
- Enter your questions in the chat input.
- View the AI-generated answers in real time.

Dependencies

Streamlit: For building the interactive web application.
LangChain: For document loaders, text splitting, and chains.
FAISS: For efficient similarity search.
Google Generative AI: For generating embeddings.
Groq: For large language model-based question answering.

Future Improvements

Add support for multi-file uploads.
Enhance UI/UX for better user experience.
Integrate additional document formats (e.g., Word, TXT).
Allow saving and exporting of chat interactions.

Contributing

Contributions are welcome! Feel free to fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Osama Abo Bakr
GitHub: Osama-Abo-Bakr

Acknowledgments

Special thanks to the developers of Streamlit, LangChain, FAISS, Google Generative AI, and Groq for their amazing tools!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
.env-example		.env-example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG: Chat with any PDFs 📚

Features

How It Works

Installation

Usage

Dependencies

Future Improvements

Contributing

License

Author

Acknowledgments

About

Releases

Packages

Languages

Osama-Abo-Bakr/RAG-Chat-with-any-PDFs

Folders and files

Latest commit

History

Repository files navigation

RAG: Chat with any PDFs 📚

Features

How It Works

Installation

Usage

Dependencies

Future Improvements

Contributing

License

Author

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages