A Sidecar service for applications that need vector database functionality to augment their LLMs. This service provides embeddings and retrieval capabilities by abstracting embeddings generation (LiteLLM) and vector storage and search (Qdrant).
- What is this service?
- Deployment Pattern
- Integration Methods
- Observability & Distributed Tracing
- Development, Contributing and Deployment
This service is designed to be deployed alongside your main application as a companion service. It provides vector database functionality without requiring your main application to handle the complexity of:
- Document chunking and processing
- Embedding generation via LiteLLM
- Vector storage and retrieval via Qdrant
- Metadata management and data isolation
- Single-tenant: Designed for one-to-one deployment with your application
- Focused responsibility: Handles only vector database operations
- API-driven: Communicates with your application via REST API
- Intelligent Processing: Uses semantic chunking for optimal document understanding
- Full Observability: Built-in distributed tracing and comprehensive monitoring
- Enterprise-Grade: Production-ready with complete observability and distributed context propagation
This service is a good fit for applications that need to:
- RAG (Retrieval-Augmented Generation): Store and retrieve relevant documents to augment LLM responses
- Semantic Search: Find similar content based on meaning, not just keywords
- Document Management: Process and store documents with intelligent semantic chunking
- Knowledge Bases: Build searchable knowledge repositories for your application
- Large Document Processing: Handle documents up to 50MB with semantic chunking for optimal search quality
graph TB
%% Main Application
subgraph "Your Application"
UI[User Interface]
BL[Business Logic]
API[API Layer]
end
%% Vectors Gateway (Sidecar)
subgraph "Vectors Gateway (Sidecar)"
VG_API[API Server]
DOC[Document Processor]
EMB[Embeddings Service]
VEC[Vector Storage]
end
%% External Services
LiteLLM[LiteLLM Service]
Qdrant[(Qdrant Vector DB)]
PostgreSQL[(PostgreSQL)]
Langfuse[Langfuse<br/>Observability]
%% Client Library
CLIENT[Client Library<br/>lib/client]
%% Connections
UI --> BL
BL --> API
API -->|HTTP/REST| VG_API
API -->|TypeScript Client| CLIENT
CLIENT -->|HTTP/REST| VG_API
%% Vectors Gateway Internal Flow
VG_API --> DOC
VG_API --> EMB
DOC --> EMB
EMB --> LiteLLM
DOC --> VEC
VEC --> Qdrant
DOC --> PostgreSQL
%% Observability Flow
VG_API -.->|Traces & Metrics| Langfuse
EMB -.->|Performance Metrics| Langfuse
%% Styling
classDef app fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
classDef sidecar fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
classDef external fill:#fff3e0,stroke:#f57c00,stroke-width:2px
classDef database fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
classDef client fill:#fce4ec,stroke:#c2185b,stroke-width:2px
classDef observability fill:#f1f8e9,stroke:#689f38,stroke-width:2px
class UI,BL,API app
class VG_API,DOC,EMB,VEC sidecar
class LiteLLM external
class Qdrant,PostgreSQL database
class CLIENT client
class Langfuse observability
Key Points:
- Sidecar Pattern: Vectors Gateway runs alongside your application
- Two Integration Options: Use the TypeScript client library or direct API calls
- Single Responsibility: Your app handles business logic, Vectors Gateway handles vector operations
- Complete Observability: All operations are tracked and monitored through Langfuse
- Distributed Tracing: Full request context maintained across service boundaries
- Data Isolation: All operations are isolated by userId, knowledgeBaseId, and documentId
For TypeScript/JavaScript applications, you can use the included client library:
import { VectorsGatewayClient } from '@url4irl/vectors-gateway';
const client = new VectorsGatewayClient('your-api-key', 'http://my-vectors-gateway-url');
// Store a document
const response = await client.storeDocument(
'Your document content here',
123, // userId
456, // knowledgeBaseId
789 // documentId
);
// Search across knowledge base
const results = await client.searchKnowledgeBase(
'machine learning algorithms',
123, // userId
456, // knowledgeBaseId
{ limit: 10, scoreThreshold: 0.8 }
);
// Search within specific document
const docResults = await client.searchInDocument(
'neural networks',
123, // userId
456, // knowledgeBaseId
789, // documentId
{ limit: 5 }
);
// Delete a document
await client.deleteDocument(789, 123, 456);
// Check service health
const health = await client.healthCheck();
For other languages or direct API usage, use the OpenAPI specification.
OpenAPI Specification: Available at /docs
when the service is running, or see openapi.json
for the complete specification.
The included TypeScript client library provides:
- Type Safety: Full TypeScript support with Zod validation
- Method Chaining: Intuitive API with methods like
searchKnowledgeBase()
,searchInDocument()
- Error Handling: Built-in error handling with descriptive messages
- Request Validation: Automatic validation of request parameters
- Health Monitoring: Built-in health check and service info methods
- Explicit Configuration: Required base URL and API key parameters for clear configuration
As a Sidecar service, the Vectors Gateway operates as follows:
- API Communication: Your main application communicates with this service via REST API calls
- Authentication: All requests require an API key for security and data isolation
- Document Processing: When your app needs to store documents:
- Documents are automatically chunked using semantic chunking for optimal content understanding
- Each chunk is embedded through LiteLLM
- Vectors are stored in Qdrant with metadata in PostgreSQL
- Semantic Search: When your app needs to retrieve relevant content:
- Query is embedded using the same model
- Similar vectors are found in Qdrant
- Results are returned with similarity scores
- Data Isolation: All operations are isolated by
userId
,knowledgeBaseId
, anddocumentId
- Flexible Search Scope:
- Knowledge Base Level: Search across all documents in a knowledge base
- Document Level: Search within a specific document
- Configurable Scoring: Adjustable similarity threshold (default: 0.5)
The Vectors Gateway includes comprehensive observability features powered by Langfuse and distributed tracing to provide complete visibility into your vector operations.
The service automatically tracks and monitors all operations through Langfuse:
- Document Processing Pipeline: Complete visibility into document ingestion, chunking, embedding generation, and vector storage
- Search Operations: Track query embedding, vector similarity search, and result ranking
- Performance Metrics: Monitor embedding generation time, storage performance, and search latency
- Error Tracking: Comprehensive error logging with full context and stack traces
- User Analytics: Track usage patterns, document processing volumes, and search performance per user
The service implements Distributed Tracing, allowing you to:
- Track requests across service boundaries
- Maintain request context through the entire call chain
- Correlate logs and metrics across multiple services
- Debug complex distributed systems
The service supports distributed tracing to maintain request context across service boundaries:
The service supports multiple trace header formats for maximum compatibility:
x-trace-id
: Primary trace ID headerx-b3-traceid
: B3 format (Zipkin compatibility)traceparent
: OpenTelemetry formatx-span-id
: Span ID for nested operationsx-parent-trace-id
: Parent trace context
The TypeScript client library includes built-in support for distributed tracing:
import { VectorsGatewayClient, TraceUtils } from '@url4irl/vectors-gateway';
// Create client with trace headers
const traceId = TraceUtils.generateTraceId();
const client = new VectorsGatewayClient(apiKey, baseUrl, { traceId });
// All operations will be linked in the same trace
const results = await client.searchDocuments("machine learning", 1, 1);
const stored = await client.storeDocument("AI content", 1, 1, 123);
// Create child spans for specific operations
const childClient = client.withTraceHeaders(
TraceUtils.createChildSpanHeaders(traceId)
);
await childClient.deleteDocument(123, 1, 1);
All operations will appear as linked spans in Langfuse, providing complete visibility into your vector operations.
pnpm install
pnpm dev
# Service will run on http://localhost:4000
This starts the Express app and some Docker services (see dev/docker-compose.yml
).
OpenAPI is served by lib/docs.ts
from openapi.json
. Update the JSON file when changing endpoints.
You'll need a running LiteLLM instance (with embeddings support), Qdrant and a Postgres database. The provided Docker Compose file for local development includes a PostgreSQL database and Qdrant instance.
Swagger UI is available at /docs
when service is running. OpenAPI spec: openapi.json
.
PORT
(default: 4000)NODE_ENV
(default: development)API_KEY
(required) - API key for authenticationLITELLM_BASE_URL
(e.g., http://localhost:4000 for your LiteLLM proxy)LITELLM_API_KEY
(you must generate an API key from your LiteLLM instance)QDRANT_URL
(default: http://localhost:6333)QDRANT_API_KEY
(optional)QDRANT_COLLECTION_NAME
(default: documents)DEFAULT_EMBEDDING_MODEL
(default: openai/bge-m3:latest. Note, this is not an OpenAI model, it's a model from BAAI. It is prefixed with openai/ to inform LiteLLM to use the OpenAI API format (via Ollama).)LANGFUSE_PUBLIC_KEY
(required) - Your Langfuse public keyLANGFUSE_SECRET_KEY
(required) - Your Langfuse secret keyLANGFUSE_BASE_URL
(required) - Langfuse instance URL
Database migrations are managed using Drizzle ORM. In a production environment, migrations must be applied manually by accessing the running container and executing the following command within it:
pnpm drizzle-kit migrate --config ./dist/drizzle.config.js
This command will apply any pending schema changes to the database. Ensure you run this command after any deployment that includes database schema modifications.
In development, create and apply migrations using:
pnpm run db:generate # Generates a new migration file
pnpm run db:migrate # Applies the migration to the database
When code changes are pushed to the repository, the container is rebuilt and the updated service is deployed.
Use the Dockerfile to deploy this service to wherever you want.
All the environment variables are documented in the Environment Variables section are required.
This service needs a running LiteLLM instance, Qdrant and a Postgres database.
Contributions are always welcome ❤️