Releases: onamfc/rag-chat
v0.1.0 - Initial Release
Xantus v0.1.0 - Initial Release
I'm excited to announce the initial release of Xantus, a privacy-first RAG (Retrieval-Augmented Generation) chat system that lets you have conversations with your documents using AI.
What is Xantus?
Xantus is an open-source document chat system that allows you to upload documents and ask questions about them using large language models. Unlike cloud-only solutions, Xantus can run completely locally or use cloud providers - your choice.
Key Philosophy:
- Privacy First - All data stays on your system with local AI
- Extensible - MCP integration for external tools
- Multiple UIs - Streamlit interface + OpenAI-compatible API
- Multi-Provider - Supports Ollama, OpenAI, Anthropic, and more
Key Features
Document Processing
- Multiple Format Support - Upload PDF, DOCX, TXT, and Markdown files
- Semantic Search - RAG-powered retrieval with ChromaDB vector store
- Smart Chunking - Configurable chunk size and overlap for optimal retrieval
- Complete Deletion - Properly removes documents from vector store (no orphaned data!)
Chat & Retrieval
- Interactive Chat - Natural conversation interface with context awareness
- Source Citations - See exactly where answers came from with:
- Document name and page number
- Chunk index for precise location
- Relevance score (0-100%)
- Full text excerpts (expandable)
- Configurable RAG - Adjust similarity_top_k, chunk size, and overlap
- Chat History - Conversation persistence in the UI
AI Provider Support
- Ollama - Completely local, privacy-first (Llama 3.2, Mistral, etc.)
- OpenAI - GPT-4, GPT-3.5-turbo with streaming support
- Anthropic - Claude Sonnet 4, Claude Haiku
- Hybrid Mode - Cloud LLM + local embeddings for cost optimization
MCP (Model Context Protocol) Integration
- External Tools - Calculator, file system, text processing, weather
- Extensible - Add custom MCP servers easily
- TypeScript Support - Integrated MCP server template
- Built-in Tools:
- Calculator - Perform mathematical operations
- File System - Read/write/list files
- Text Processing - Word count, sentiment analysis, case conversion
- Weather - Weather data retrieval
User Interface
- Streamlit UI - Clean, modern chat interface
- Settings Panel - Toggle RAG context and source citations
- Document Management - Upload, list, and delete documents
- Real-time Updates - Live document list and chat history
- Responsive - Works on desktop and tablet
API & Integration
- RESTful API - OpenAI-compatible chat completions endpoint
- FastAPI - High-performance async API server
- CORS Support - Configurable cross-origin requests
- Health Checks - Monitor system status
- OpenAPI Docs - Auto-generated API documentation at
/docs
Architecture & Quality
- Dependency Injection - Clean, testable architecture using Injector
- Factory Pattern - Easy to swap LLMs, embeddings, vector stores
- Type Safety - Pydantic models throughout
- Async Support - Non-blocking operations
- Structured Logging - Clear visibility into system operations
What's Included
Core Components
xantus/
├── xantus/ # Main Python package
│ ├── api/ # FastAPI endpoints
│ │ ├── chat_router.py # Chat completions
│ │ ├── ingest_router.py # Document upload/management
│ │ └── embeddings_router.py
│ ├── services/ # Business logic
│ │ ├── chat_service.py # RAG chat with sources
│ │ ├── ingest_service.py # Document processing
│ │ └── mcp_service.py # MCP orchestration
│ ├── components/ # Component factories
│ │ ├── llm/ # LLM provider factory
│ │ ├── embeddings/ # Embedding factory
│ │ └── vector_store/ # Vector store factory
│ ├── models/ # Data models
│ └── config/ # Settings management
├── ui/ # Streamlit interface
├── mcp-servers/ # MCP integration (submodule)
└── config.yaml # Main configuration
Supported Configurations
Fully Local (Privacy-First):
- Ollama for LLM (Llama 3.2, Mistral, etc.)
- HuggingFace embeddings (BAAI/bge-small-en-v1.5)
- ChromaDB vector store
- Zero external API calls
Cloud-Powered:
- Anthropic Claude Sonnet 4 / Haiku
- OpenAI GPT-4 / GPT-3.5-turbo
- OpenAI embeddings
- ChromaDB local storage
Hybrid (Recommended):
- Cloud LLM (better quality)
- Local embeddings (lower cost)
- ChromaDB local storage
Technical Details
Vector Store
- Provider: ChromaDB with persistent storage
- Features:
- Proper document deletion (filters by file_name)
- Metadata recovery on restart
- Configurable collection names
- Storage: Local SQLite-based storage
RAG Configuration
- Default chunk size: 1024 characters
- Default overlap: 200 characters
- Default top_k: 5 similar chunks
- Splitter: SentenceSplitter for semantic boundaries
API Endpoints
Chat:
POST /v1/chat/completions- Chat with RAG contextPOST /v1/chunks/retrieve- Retrieve similar chunks
Documents:
POST /v1/ingest/file- Upload documentGET /v1/documents- List all documentsDELETE /v1/documents/{id}- Delete document
System:
GET /health- Health checkGET /docs- OpenAPI documentation
Known Issues & Limitations
Current Limitations
- Streaming with Sources - Source citations only available in non-streaming mode
- Single Collection - All documents in one ChromaDB collection
- No User Authentication - API is open (add middleware for production)
- In-Memory Metadata - Document metadata not persisted to database (recovered from files)
- No Multi-tenancy - Single-user design (can be extended)
Workarounds
- For streaming, disable
include_sourcesparameter - For production, add FastAPI authentication middleware
- For multiple users, extend with database-backed metadata storage
What's Next (Future Releases)
Planned features for upcoming versions:
v0.2.0 (Planned)
- Streaming support with sources
- Persistent metadata storage (PostgreSQL/SQLite)
- Multiple vector store collections
- Reranking support for better retrieval
- Document update/versioning
v0.3.0 (Planned)
- User authentication & authorization
- Multi-tenancy support
- Document folder organization
- Advanced search filters
- Conversation management (save/load)
Future Considerations
- Qdrant vector store support
- Additional embedding providers
- Chat export functionality
- Document preprocessing pipeline
- OCR support for scanned PDFs
- Image/table extraction
Metrics
Lines of Code: ~2,500 (excluding MCP server)
Dependencies: 15 core packages
Supported File Types: 4 (PDF, DOCX, TXT, MD)
LLM Providers: 3 (Ollama, OpenAI, Anthropic)
MCP Tools: 4 built-in tools
Acknowledgments
Built with:
- FastAPI - Modern async web framework
- LlamaIndex - RAG framework
- ChromaDB - Vector database
- Streamlit - UI framework
- Model Context Protocol - Tool integration
License
Xantus is released under the MIT License. See LICENSE for details.
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Ways to contribute:
- Report bugs
- Suggest features
- Improve documentation
- Submit pull requests
- Star the repo!
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See README.md and docs/