A comprehensive and professional RAG (Retrieval Augmented Generation) system built with Go, SQLite, and Ollama. Lil-RAG provides three unified interfaces (CLI, HTTP API, and MCP server) with consistent parameters and comprehensive help documentation for indexing documents, semantic similarity searches, and AI-powered chat with advanced chunking strategies.
- π Semantic Vector Search - Advanced similarity search using SQLite with sqlite-vec extension
- π¬ Interactive Chat - RAG-powered chat with persistent sessions and source citations
- π Multi-Format Support - Native parsing for PDF, DOCX, XLSX, HTML, CSV, and text files
- π Document Management - Complete CRUD operations for indexed documents
- ποΈ Smart Storage - Automatic gzip compression and intelligent deduplication
- π Complete Documents - Returns full document content, not just chunks
- π§© Advanced Chunking - Three chunking strategies: recursive, semantic, and simple
- π» CLI Application - Full-featured command-line interface with comprehensive help documentation
- π HTTP API Server - RESTful API with interactive web interface and authentication
- π MCP Server - Model Context Protocol for AI assistant integration
- π Built-in Documentation - Comprehensive docs accessible via
/docsroute - π§ Consistent Parameters - All interfaces support identical parameters and options
- β‘ High Performance - Optimized Go implementation with efficient SQLite storage
- π€ Ollama Integration - Configurable embedding, chat, and vision models via Ollama
- ποΈ Profile Configuration - User-friendly configuration management with comprehensive options
- πΎ Persistent Storage - Reliable SQLite database with WAL mode
- π§ Health Monitoring - Built-in health checks and metrics endpoints
- π Authentication System - Optional password protection with secure session management
- π Comprehensive Help - Detailed help documentation for all commands and parameters
- Go 1.21+ with CGO support
- Ollama with an embedding model installed
- SQLite with sqlite-vec extension support
-
Install Go: Download from golang.org
-
Install Ollama: Follow instructions at ollama.ai
# Start Ollama ollama serve # Pull an embedding model ollama pull nomic-embed-text
-
SQLite-vec Extension: The Go bindings handle this automatically via CGO
One-line install from GitHub releases:
# Install to ~/.local/bin (Linux/macOS)
curl -fsSL https://raw.githubusercontent.com/streed/lil-rag/main/install.sh | bash
# Or download and run manually
curl -fsSL -O https://raw.githubusercontent.com/streed/lil-rag/main/install.sh
chmod +x install.sh
./install.sh
# Install to custom directory
./install.sh --dir /usr/local/bin
# Windows users can use Git Bash or WSLThe install script will:
- π Auto-detect your OS and architecture
- β¬οΈ Download the latest release from GitHub
- π¦ Extract and install binaries
- β Verify installation
- π Show quick start instructions
# Clone the repository
git clone https://github.com/streed/lil-rag.git
cd lil-rag
# Build both CLI and server
make build
# Or build individually
make build-cli # builds bin/lil-rag
make build-server # builds bin/lil-rag-server
make build-mcp # builds bin/lil-rag-mcp
# Install to $GOPATH/bin (optional)
make install
# Note: Pre-built binaries are available for Linux and Windows
# macOS users should build from source using the commands above# Install CLI directly
go install github.com/streed/lil-rag/cmd/lil-rag@latest
# Install server directly
go install github.com/streed/lil-rag/cmd/lil-rag-server@latest
# Install MCP server directly
go install github.com/streed/lil-rag/cmd/lil-rag-mcp@latest# Start Ollama (in a separate terminal)
ollama serve
# Pull an embedding model
ollama pull nomic-embed-text# Initialize user profile configuration
lil-rag config init
# View current settings
lil-rag config show# Index direct text (ID auto-generated if not provided)
lil-rag index "This is about machine learning and neural networks."
# Index with explicit ID
lil-rag index doc1 "This is about machine learning and neural networks."
# Index from a file with chunking strategy
lil-rag index --chunking=semantic document.txt
# Index a PDF file with specific ID and chunking
lil-rag index --chunking=recursive doc3 research_paper.pdf
# Index from stdin with chunking strategy
echo "Content about artificial intelligence" | lil-rag index --chunking=simple -# Search with default settings
lil-rag search "machine learning"
# Search with custom limit
lil-rag search --limit=5 "neural networks"
# Search returning only matching chunks (no full documents)
lil-rag search --chunks-only "AI concepts"
# Search with both limit and chunks-only
lil-rag search --limit=3 --chunks-only "machine learning algorithms"
# Get help for all search options
lil-rag search --helpExample Output:
Found 2 results:
1. ID: doc1 [Best match: Chunk 1] (Score: 0.8542)
This is about machine learning and neural networks. Neural networks are...
[complete document content shown]
2. ID: doc3 [Best match: Page 1] (Score: 0.7891)
Research Paper: Deep Learning Fundamentals...
[complete document content shown]
index [OPTIONS] [id] <text|file|->- Index content with advanced chunking strategies--chunking=STRATEGY- Choose chunking strategy: recursive, semantic, simple (default: recursive)
search [OPTIONS] <query>- Search for similar content with flexible options--limit=N- Maximum number of results to return (default: 10)--chunks-only- Return only matching chunks without full document context
chat [OPTIONS] <message> [limit]- Interactive chat with RAG context--session-id <id>- Resume existing chat session--new-session- Start new chat session--list-sessions- List all chat sessions--show-sources- Display detailed source information
documents- List all indexed documents with metadatadelete <id> [--force]- Delete a document by IDreindex [OPTIONS]- Reprocess all documents with new chunking strategy--chunking=STRATEGY- Chunking strategy for reprocessing--force- Skip confirmation prompt
health- Check system health statusconfig <init|show|set>- Manage configurationauth <add|list|delete|reset-password>- Manage authentication usersreset [--force]- Delete database and all data
π‘ Tip: All commands support --help or -h for detailed usage information and examples.
# Index with auto-generated IDs and chunking strategies
lil-rag index "Hello world" # Direct text, auto ID, default chunking
lil-rag index --chunking=semantic document.pdf # PDF with semantic chunking
lil-rag index --chunking=recursive document.docx # Word document with recursive chunking
echo "Hello world" | lil-rag index --chunking=simple - # From stdin with simple chunking
# Index with explicit IDs and chunking strategies
lil-rag index --chunking=semantic doc1 "Hello world" # Text with ID and semantic chunking
lil-rag index --chunking=recursive doc2 document.pdf # PDF with ID and recursive chunking
echo "Hello world" | lil-rag index --chunking=simple doc3 - # Stdin with ID and simple chunking
# Advanced document operations
lil-rag documents # List all documents with metadata
lil-rag delete doc1 # Delete with confirmation
lil-rag delete doc2 --force # Delete without confirmation
lil-rag reindex --chunking=semantic # Reprocess all documents with semantic chunking
lil-rag reindex --chunking=recursive --force # Reprocess without confirmation
# Get help for any command
lil-rag index --help # Detailed index help with examples
lil-rag reindex --help # Detailed reindex help# Search examples with new options
lil-rag search "machine learning" # Default search (limit=10, full documents)
lil-rag search --limit=5 "machine learning" # Search with custom limit
lil-rag search --chunks-only "AI concepts" # Return only matching chunks
lil-rag search --limit=3 --chunks-only "neural networks" # Combined options
# Chat examples with session management and source control
lil-rag chat "What is machine learning?" # Basic chat with default sources
lil-rag chat --show-sources "Explain neural networks" # Chat with explicit source display
lil-rag chat --new-session "Start a new conversation" # Force new session
lil-rag chat --session-id abc123 "Continue our discussion" # Resume specific session
lil-rag chat --list-sessions # List all chat sessions
# Get help for detailed options
lil-rag search --help # All search options and examples
lil-rag chat --help # All chat options and examplesLilRag supports persistent chat sessions that allow you to continue conversations across multiple CLI invocations. Each session gets a unique ID that you can use to resume conversations later.
# Create a new chat session
lil-rag chat --new-session "Hello, I want to start a conversation"
# Output: π Created new chat session: abc123-def456-ghi789
# π‘ Use --session-id abc123-def456-ghi789 to resume this conversation later
# Resume an existing chat session
lil-rag chat --session-id abc123-def456-ghi789 "Continue our discussion"
# Output: π Resuming chat session: abc123-def456-ghi789 (Title: New Chat)
# List all chat sessions
lil-rag chat --list-sessions
# Shows: ID, Title, Message count, Created/Updated timestamps
# You can also create sessions without the --new-session flag
lil-rag chat "This creates a session automatically"
# When no session is specified, a new one is created automatically
# Combine with context limit
lil-rag chat --session-id abc123-def456-ghi789 "Follow up question" 3Session Features:
- πΎ Persistent Storage - Messages are saved even if AI response fails
- π Resume Conversations - Use session ID to continue discussions
- π Session Management - List, track, and organize your conversations
- π Automatic Timestamps - Track when sessions were created and updated
- π Message Counting - See how many messages are in each session
- π Auto-generated IDs - Unique session identifiers for easy reference
# Configuration management with help
lil-rag config init # Initialize profile config
lil-rag config show # Show current config
lil-rag config set ollama.model nomic-embed-text # Update embedding model
lil-rag config set ollama.chat-model llama3.2 # Update chat model
lil-rag config --help # Get detailed config help
# Authentication management
lil-rag auth add username password # Add new user
lil-rag auth list # List all users
lil-rag auth delete username # Delete user
lil-rag auth reset-password username newpass # Reset user password
lil-rag auth --help # Get detailed auth help
# System management with comprehensive help
lil-rag health # Check system health
lil-rag health --help # Get health check details
lil-rag reset # Reset database (with confirmation)
lil-rag reset --force # Reset database (skip confirmation)
lil-rag reset --help # Get detailed reset information-db string Database path (overrides profile config)
-data-dir string Data directory (overrides profile config)
-ollama string Ollama URL (overrides profile config)
-model string Embedding model (overrides profile config)
-chat-model string Chat model (overrides profile config)
-vision-model string Vision model for image processing (overrides profile config)
-timeout int Ollama timeout in seconds (overrides profile config)
-vector-size int Vector size (overrides profile config)
-help Show help
-version Show version# Start with default settings (localhost:12121)
lil-rag-server
# Start with custom host/port
lil-rag-server --host 0.0.0.0 --port 9000
# Start with authentication disabled for development
lil-rag-server --no-secure
# Start with custom HTTP timeouts
lil-rag-server --read-timeout 120 --write-timeout 120 --idle-timeout 300Visit http://localhost:12121 for the web interface with API documentation and interactive chat.
# Create first user (enables authentication)
lil-rag auth add admin mySecurePassword123
# List users
lil-rag auth list
# Disable authentication for development
lil-rag config set server.secure falseThe HTTP server includes a modern, responsive chat interface for conversing with your indexed documents:
Features:
- π¨ Modern, responsive design with JetBrains Mono font
- π Document browser sidebar with click-to-view functionality
- π¬ Real-time chat with RAG-powered responses
- π Source citations with relevance scores
- π Full document display when clicking on sidebar items
- π± Mobile-friendly responsive layout
Access the Chat:
- Start the server:
lil-rag-server - Open your browser to: http://localhost:12121/chat
- Browse indexed documents in the sidebar
- Ask questions about your documents in the chat
Chat Interface Capabilities:
- Ask questions about indexed content
- View source documents with relevance scores
- Browse and preview all indexed documents
- See full document content by clicking sidebar items
- Markdown rendering for formatted responses
Index content with a unique document ID and advanced chunking strategies.
JSON Request with Chunking Strategy:
curl -X POST http://localhost:8080/api/index \
-H "Content-Type: application/json" \
-d '{
"id": "doc1",
"text": "This document discusses machine learning algorithms and their applications in modern AI systems.",
"chunking_strategy": "semantic"
}'File Upload with Chunking Strategy:
curl -X POST http://localhost:8080/api/index \
-F "id=doc2" \
-F "chunking_strategy=recursive" \
-F "file=@document.pdf"Response:
{
"success": true,
"id": "doc1",
"message": "Successfully indexed 123 characters"
}Search using query parameters or JSON body with chunks-only option.
# GET request with query parameters
curl "http://localhost:8080/api/search?query=machine%20learning&limit=5&chunks_only=false"
# POST request with all options (recommended)
curl -X POST http://localhost:8080/api/search \
-H "Content-Type: application/json" \
-d '{
"query": "artificial intelligence applications",
"limit": 3,
"chunks_only": true
}'Response:
{
"results": [
{
"ID": "doc1",
"Text": "This document discusses machine learning algorithms...",
"Score": 0.8542,
"Metadata": {
"chunk_index": 1,
"chunk_type": "text",
"is_chunk": true,
"file_path": "/path/to/compressed/file.gz",
"matching_chunk": "...algorithms and their applications..."
}
}
]
}Interactive chat with RAG context, session management, and source control.
# Basic chat request
curl -X POST http://localhost:8080/api/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What is machine learning?",
"limit": 5
}'
# Advanced chat with session management and source control
curl -X POST http://localhost:8080/api/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Continue our discussion about neural networks",
"session_id": "existing-session-123",
"new_session": false,
"show_sources": true,
"limit": 5
}'Response:
{
"response": "Machine learning is a subset of artificial intelligence...",
"sources": [
{
"ID": "doc1",
"Text": "Machine learning algorithms...",
"Score": 0.8542
}
],
"query": "What is machine learning?"
}List all indexed documents with metadata.
curl http://localhost:8080/api/documentsResponse:
{
"documents": [
{
"id": "doc1",
"doc_type": "text",
"chunk_count": 3,
"source_path": "/path/to/file.txt",
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:30:00Z"
}
]
}Delete a specific document and all its chunks.
curl -X DELETE http://localhost:8080/api/documents/doc1Health check endpoint for monitoring.
curl http://localhost:8080/api/healthResponse:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.0.0"
}Performance metrics and system information.
curl http://localhost:8080/api/metrics- Home:
http://localhost:8080/- API overview and quick actions - Chat Interface:
http://localhost:8080/chat- Interactive chat with your documents - Document Library:
http://localhost:8080/documents- Browse and manage documents - Documentation:
http://localhost:8080/docs- Complete API reference and guides
The Model Context Protocol (MCP) server allows AI assistants and tools to interact with your RAG system seamlessly.
# Start with default settings
lil-rag-mcp
# The server uses the same profile configuration as CLI/HTTP server
# Or falls back to environment variables:
LILRAG_DB_PATH=/path/to/database.db \
LILRAG_OLLAMA_URL=http://localhost:11434 \
LILRAG_MODEL=nomic-embed-text \
lil-rag-mcpAll MCP tools now support the same parameters as the CLI and HTTP interfaces for complete consistency.
Index text content into the RAG system with advanced chunking strategies.
Parameters:
text(required): Text content to indexid(optional): Document ID (auto-generated if not provided)chunking_strategy(optional): Chunking strategy: recursive, semantic, simple (default: recursive)
Index files (PDF, DOCX, XLSX, HTML, CSV, text) with advanced chunking strategies.
Parameters:
file_path(required): Path to file to indexid(optional): Document ID (defaults to filename)chunking_strategy(optional): Chunking strategy: recursive, semantic, simple (default: recursive)
Semantic similarity search with flexible result options.
Parameters:
query(required): Search querylimit(optional): Max results (default: 10, max: 50)chunks_only(optional): Return only matching chunks without full document context (default: false)
Interactive chat with RAG context, session management, and source control.
Parameters:
message(required): Question or messagelimit(optional): Max context documents (default: 5, max: 20)session_id(optional): Session ID to maintain conversation contextnew_session(optional): Start a new chat session (default: false)show_sources(optional): Display detailed source information (default: true)
List all indexed documents with metadata.
Parameters: None
Delete a document and all its chunks with optional force mode.
Parameters:
document_id(required): ID of document to deleteforce(optional): Skip confirmation prompt (default: false, note: no effect in MCP as operations are programmatic)
The MCP server can be integrated with various AI tools and assistants that support the Model Context Protocol. The server provides a standard interface for document indexing, searching, and chat functionality.
Lil-RAG supports three advanced chunking strategies across all interfaces (CLI, HTTP, and MCP):
- Best for: General-purpose text processing and most documents
- Approach: Hierarchical text splitting with semantic boundaries
- Features:
- Respects paragraph and sentence boundaries
- Maintains logical document structure
- Optimal balance between context and precision
- Use when: You want reliable, consistent chunking for mixed content types
- Best for: Documents where topic coherence is critical
- Approach: Adaptive chunking focused on semantic similarity between sentences
- Features:
- Groups semantically related content together
- Dynamically adjusts chunk boundaries based on content similarity
- Preserves topical coherence within chunks
- Use when: Working with research papers, technical documentation, or content where maintaining topic boundaries is important
- Best for: Quick processing and straightforward text splitting
- Approach: Basic character-based chunking with word boundaries
- Features:
- Fast processing with minimal computational overhead
- Predictable chunk sizes
- Good for simple text extraction scenarios
- Use when: You need fast processing or working with simple, homogeneous text
# For general documents and mixed content (recommended default)
lil-rag index --chunking=recursive document.pdf
# For academic papers and technical documents where topic coherence matters
lil-rag index --chunking=semantic research_paper.pdf
# For quick processing of simple text
lil-rag index --chunking=simple plain_text.txt
# Reprocess existing documents with a different strategy
lil-rag reindex --chunking=semantic --forceπ‘ Performance Notes:
- Recursive: Balanced performance and quality
- Semantic: Higher computational cost due to similarity calculations, but better topic coherence
- Simple: Fastest processing, minimal memory usage
All chunking strategies respect the configured max_chars and overlap settings from your profile configuration.
LilRag uses a profile-based configuration system that stores settings in a JSON file in your user profile directory (~/.lilrag/config.json).
# Initialize profile configuration with defaults
lil-rag config init
# View current configuration
lil-rag config showThe configuration includes:
- Ollama Settings: Endpoint URL, embedding model, and vector size
- Storage: Database path and data directory for indexed content
- Server: HTTP server host and port
Example profile configuration (~/.lilrag/config.json):
{
"ollama": {
"endpoint": "http://localhost:11434",
"embedding_model": "nomic-embed-text",
"chat_model": "llama3.2",
"vision_model": "llama3.2-vision",
"timeout_seconds": 30,
"vector_size": 768
},
"storage_path": "/home/user/.lilrag/data/lilrag.db",
"data_dir": "/home/user/.lilrag/data",
"server": {
"host": "localhost",
"port": 8080
},
"chunking": {
"max_chars": 2000,
"overlap": 200
}
}LilRag supports image processing with configurable vision models for OCR and image analysis:
- vision_model: Vision model for image processing (default: "llama3.2-vision")
- Supports any Ollama vision model (llama3.2-vision, llava, bakllava, etc.)
- Automatically handles image files (JPG, PNG, PDF with images, etc.)
Configure HTTP timeouts for Ollama API calls:
- timeout_seconds: Base timeout for API calls (default: 30 seconds)
- Embeddings: Uses the exact timeout value
- Chat operations: Uses 4x timeout (120s default) for longer responses
- Vision/Image processing: Uses 10x timeout (300s default) for complex OCR
Optimize text chunking for your use case with character-based chunking:
- max_chars: Maximum characters per chunk (default: 2000, optimized for modern RAG practices)
- overlap: Character overlap between chunks (default: 200, 10% overlap ratio)
- chunking strategy: Choose between recursive, semantic, or simple chunking
- Smaller chunks provide more precise search results
- Larger chunks preserve more context per result
# Optimize for precision (smaller chunks)
lil-rag config set chunking.max-chars 1000
lil-rag config set chunking.overlap 100
# Optimize for context (larger chunks)
lil-rag config set chunking.max-chars 4000
lil-rag config set chunking.overlap 400
# Use minimal chunking for simple text
lil-rag config set chunking.max-chars 500
lil-rag config set chunking.overlap 50Note: The system has migrated from token-based to character-based chunking for more predictable and consistent results across different text types and languages.
# Set Ollama endpoint
lil-rag config set ollama.endpoint http://192.168.1.100:11434
# Change embedding model
lil-rag config set ollama.model all-MiniLM-L6-v2
# Change chat model
lil-rag config set ollama.chat-model llama3.2
# Change vision model for image processing
lil-rag config set ollama.vision-model llama3.2-vision
# Update Ollama timeout (in seconds)
lil-rag config set ollama.timeout-seconds 60
# Update vector size (must match embedding model)
lil-rag config set ollama.vector-size 384
# Change data directory
lil-rag config set data.dir /path/to/my/data
# Update server settings
lil-rag config set server.port 9000package main
import (
"context"
"fmt"
"log"
"os"
"path/filepath"
"lil-rag/pkg/lilrag"
)
func main() {
// Create configuration
homeDir, _ := os.UserHomeDir()
dataDir := filepath.Join(homeDir, ".lilrag", "data")
config := &lilrag.Config{
DatabasePath: filepath.Join(dataDir, "test.db"),
DataDir: dataDir,
OllamaURL: "http://localhost:11434",
Model: "nomic-embed-text",
ChatModel: "gemma3:4b",
VisionModel: "llama3.2-vision",
TimeoutSeconds: 30,
VectorSize: 768,
MaxChars: 2000,
Overlap: 200,
}
// Initialize LilRag
rag, err := lilrag.New(config)
if err != nil {
log.Fatal(err)
}
defer rag.Close()
if err := rag.Initialize(); err != nil {
log.Fatal(err)
}
ctx := context.Background()
// Index content - note the parameter order: text first, then id
err = rag.Index(ctx, "This is a document about Go programming", "doc1")
if err != nil {
log.Fatal(err)
}
// Search for similar content
results, err := rag.Search(ctx, "Go programming", 5)
if err != nil {
log.Fatal(err)
}
for _, result := range results {
fmt.Printf("ID: %s, Score: %.4f\n", result.ID, result.Score)
fmt.Printf("Text: %s\n\n", result.Text)
}
}# Run tests
make test
# Build for current platform
make build
# Build for all platforms (Linux, macOS, Windows)
make build-cross
# Format code
make fmt
# Lint code
make lint
# Clean build artifacts
make clean
# Install binaries to $GOPATH/bin
make install
# Show current version
make versionThe project uses semantic versioning stored in the VERSION file. When code is merged to the main branch, the build system automatically:
- Increments the patch version (e.g., 1.0.0 β 1.0.1)
- Builds cross-platform binaries for Linux, macOS, and Windows
- Embeds the version into the binaries at build time
- Creates release archives with checksums
- Updates the VERSION file in the repository
The CI/CD system builds binaries using native platform runners to avoid CGO cross-compilation issues:
- Linux: AMD64, ARM64 (built on Ubuntu runners)
- macOS: AMD64 (Intel), ARM64 (Apple Silicon) (built on macOS runners)
- Windows: AMD64 (built on Windows runners)
This approach uses pre-compiled Go binaries on each platform for reliable builds with CGO dependencies.
All binaries include the version information and can be checked with:
lil-rag --version
lil-rag-server --versionlil-rag/
βββ cmd/ # Main applications
β βββ lil-rag/ # CLI application
β βββ lil-rag-server/ # HTTP API server
βββ pkg/ # Public library packages
β βββ lilrag/ # Core RAG functionality
β β βββ storage.go # SQLite + sqlite-vec storage
β β βββ embedder.go # Ollama integration
β β βββ chunker.go # Text chunking logic
β β βββ compression.go # Gzip compression
β β βββ pdf.go # PDF parsing
β β βββ lilrag.go # Main library interface
β βββ config/ # Configuration management
βββ internal/ # Private application code
β βββ handlers/ # HTTP request handlers
βββ examples/ # Example programs
β βββ library/ # Library usage example
β βββ profile/ # Profile config example
βββ .github/ # GitHub templates and workflows
β βββ workflows/ # CI/CD pipelines
β βββ ISSUE_TEMPLATE/ # Issue templates
βββ docs/ # Additional documentation
- Storage Layer: SQLite with sqlite-vec for efficient vector operations
- Embedding Layer: Ollama integration with configurable models
- Processing Layer: Text chunking, PDF parsing, and compression
- API Layer: REST endpoints and CLI interface
- Configuration: Profile-based user configuration system
- Profile config location:
~/.lilrag/config.json - Initialize config if missing:
lil-rag config init - Check config values:
lil-rag config show - Reset to defaults: Delete config file and run
lil-rag config init
- Ensure sqlite-vec is installed and available in your SQLite
- The extension file should be accessible as
vec0
- Verify Ollama is running:
ollama list - Check the Ollama URL:
lil-rag config show - Update endpoint:
lil-rag config set ollama.endpoint http://localhost:11434 - Ensure the embedding model is pulled:
ollama pull nomic-embed-text
- Different models have different vector sizes
- Common sizes: 768 (nomic-embed-text), 384 (all-MiniLM-L6-v2), 1536 (text-embedding-ada-002)
- Update vector size:
lil-rag config set ollama.vector-size 768
- Files are stored in the configured data directory
- Check location:
lil-rag config show - Change location:
lil-rag config set data.dir /path/to/data - Ensure write permissions to the directory
- Ensure vision model is available:
ollama list | grep vision - Pull vision model if missing:
ollama pull llama3.2-vision - Change vision model:
lil-rag config set ollama.vision-model llava - Supported models: llama3.2-vision, llava, bakllava, moondream, etc.
- Increase timeout for slow operations:
lil-rag config set ollama.timeout-seconds 120 - Chat timeouts use 4x base timeout (default: 120s)
- Vision processing uses 10x base timeout (default: 300s)
- Monitor /api/metrics for average response times
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.