This project is a web application built with Streamlit that implements a Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex. It allows users to ask questions about a collection of documents and receive answers with citations, pointing to the specific sources of information within the documents.
The application uses ChromaDB as a vector store to efficiently retrieve relevant context from the documents in the data directory.
These are screenshots of an ongoing conversation, showing the response of the LLM with source citation:
- Interactive chat interface powered by Streamlit.
- Query documents using natural language.
- Receive answers generated by a Large Language Model (LLM).
- Answers include citations to the source documents for verification.
- Uses LlamaIndex for the RAG pipeline.
- Uses ChromaDB for vector storage.
-
Clone the repository:
git clone <repository-url> cd streamlit-llamaindex-rag-citation
-
Create and activate a Python virtual environment:
python -m venv citationenv source citationenv/bin/activate # On Windows, use: # citationenv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your API keys: Create a file named
secrets.tomlinside a.streamlitdirectory..streamlit/ └── secrets.toml -
Add your OpenAI API key to
secrets.toml:OPENAI_API_KEY = "sk-..."
-
Add your data: Place the PDF documents you want to query in the
datadirectory. The project comes with some example documents. -
Run the Streamlit application:
streamlit run citation_app.py
-
Open your browser: Navigate to the local URL provided by Streamlit (usually
http://localhost:8501).
.
├── .gitignore
├── citation_app.py # Main Streamlit application
├── readme.md # This file
├── requirements.txt # Python dependencies
├── .streamlit/
│ └── secrets.toml # Secrets management for Streamlit
├── chroma_db/ # ChromaDB vector store
├── citation/ # LlamaIndex storage
├── citationenv/ # Python virtual environment
└── data/ # Source documents (PDFs)

