Vectors Gateway

A Sidecar service for applications that need vector database functionality to augment their LLMs. This service provides embeddings and retrieval capabilities by abstracting embeddings generation (LiteLLM) and vector storage and search (Qdrant).

What is this service?
- Key Characteristics
- Use Cases
Deployment Pattern
Integration Methods
Observability & Distributed Tracing
- Langfuse Integration
- Distributed Tracing
  - Trace Headers
  - Client Library with Distributed Tracing
Development, Contributing and Deployment

What is this service?

This service is designed to be deployed alongside your main application as a companion service. It provides vector database functionality without requiring your main application to handle the complexity of:

Document chunking and processing
Embedding generation via LiteLLM
Vector storage and retrieval via Qdrant
Metadata management and data isolation

Key Characteristics

Single-tenant: Designed for one-to-one deployment with your application
Focused responsibility: Handles only vector database operations
API-driven: Communicates with your application via REST API
Intelligent Processing: Uses semantic chunking for optimal document understanding
Full Observability: Built-in distributed tracing and comprehensive monitoring
Enterprise-Grade: Production-ready with complete observability and distributed context propagation

Use Cases

This service is a good fit for applications that need to:

RAG (Retrieval-Augmented Generation): Store and retrieve relevant documents to augment LLM responses
Semantic Search: Find similar content based on meaning, not just keywords
Document Management: Process and store documents with intelligent semantic chunking
Knowledge Bases: Build searchable knowledge repositories for your application
Large Document Processing: Handle documents up to 50MB with semantic chunking for optimal search quality

Deployment Pattern

graph TB
    %% Main Application
    subgraph "Your Application"
        UI[User Interface]
        BL[Business Logic]
        API[API Layer]
    end
    
    %% Vectors Gateway (Sidecar)
    subgraph "Vectors Gateway (Sidecar)"
        VG_API[API Server]
        DOC[Document Processor]
        EMB[Embeddings Service]
        VEC[Vector Storage]
    end
    
    %% External Services
    LiteLLM[LiteLLM Service]
    Qdrant[(Qdrant Vector DB)]
    PostgreSQL[(PostgreSQL)]
    Langfuse[Langfuse<br/>Observability]
    
    %% Client Library
    CLIENT[Client Library<br/>lib/client]
    
    %% Connections
    UI --> BL
    BL --> API
    API -->|HTTP/REST| VG_API
    API -->|TypeScript Client| CLIENT
    CLIENT -->|HTTP/REST| VG_API
    
    %% Vectors Gateway Internal Flow
    VG_API --> DOC
    VG_API --> EMB
    DOC --> EMB
    EMB --> LiteLLM
    DOC --> VEC
    VEC --> Qdrant
    DOC --> PostgreSQL
    
    %% Observability Flow
    VG_API -.->|Traces & Metrics| Langfuse
    EMB -.->|Performance Metrics| Langfuse
    
    %% Styling
    classDef app fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    classDef sidecar fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    classDef external fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    classDef database fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
    classDef client fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    classDef observability fill:#f1f8e9,stroke:#689f38,stroke-width:2px
    
    class UI,BL,API app
    class VG_API,DOC,EMB,VEC sidecar
    class LiteLLM external
    class Qdrant,PostgreSQL database
    class CLIENT client
    class Langfuse observability

Key Points:

Sidecar Pattern: Vectors Gateway runs alongside your application
Two Integration Options: Use the TypeScript client library or direct API calls
Single Responsibility: Your app handles business logic, Vectors Gateway handles vector operations
Complete Observability: All operations are tracked and monitored through Langfuse
Distributed Tracing: Full request context maintained across service boundaries
Data Isolation: All operations are isolated by userId, knowledgeBaseId, and documentId

Integration Methods

Option 1: TypeScript/JavaScript Client Library

For TypeScript/JavaScript applications, you can use the included client library:

import { VectorsGatewayClient } from '@url4irl/vectors-gateway';

const client = new VectorsGatewayClient('your-api-key', 'http://my-vectors-gateway-url');

// Store a document
const response = await client.storeDocument(
  'Your document content here',
  123, // userId
  456, // knowledgeBaseId
  789  // documentId
);

// Search across knowledge base
const results = await client.searchKnowledgeBase(
  'machine learning algorithms',
  123, // userId
  456, // knowledgeBaseId
  { limit: 10, scoreThreshold: 0.8 }
);

// Search within specific document
const docResults = await client.searchInDocument(
  'neural networks',
  123, // userId
  456, // knowledgeBaseId
  789, // documentId
  { limit: 5 }
);

// Delete a document
await client.deleteDocument(789, 123, 456);

// Check service health
const health = await client.healthCheck();

Option 2: Direct API Integration

For other languages or direct API usage, use the OpenAPI specification.

OpenAPI Specification: Available at /docs when the service is running, or see openapi.json for the complete specification.

Client Library Features

The included TypeScript client library provides:

Type Safety: Full TypeScript support with Zod validation
Method Chaining: Intuitive API with methods like searchKnowledgeBase(), searchInDocument()
Error Handling: Built-in error handling with descriptive messages
Request Validation: Automatic validation of request parameters
Health Monitoring: Built-in health check and service info methods
Explicit Configuration: Required base URL and API key parameters for clear configuration

How It Works

As a Sidecar service, the Vectors Gateway operates as follows:

API Communication: Your main application communicates with this service via REST API calls
Authentication: All requests require an API key for security and data isolation
Document Processing: When your app needs to store documents:
- Documents are automatically chunked using semantic chunking for optimal content understanding
- Each chunk is embedded through LiteLLM
- Vectors are stored in Qdrant with metadata in PostgreSQL
Semantic Search: When your app needs to retrieve relevant content:
- Query is embedded using the same model
- Similar vectors are found in Qdrant
- Results are returned with similarity scores
Data Isolation: All operations are isolated by userId, knowledgeBaseId, and documentId
Flexible Search Scope:
- Knowledge Base Level: Search across all documents in a knowledge base
- Document Level: Search within a specific document
- Configurable Scoring: Adjustable similarity threshold (default: 0.5)

Observability & Distributed Tracing

The Vectors Gateway includes comprehensive observability features powered by Langfuse and distributed tracing to provide complete visibility into your vector operations.

Langfuse Integration

The service automatically tracks and monitors all operations through Langfuse:

Document Processing Pipeline: Complete visibility into document ingestion, chunking, embedding generation, and vector storage
Search Operations: Track query embedding, vector similarity search, and result ranking
Performance Metrics: Monitor embedding generation time, storage performance, and search latency
Error Tracking: Comprehensive error logging with full context and stack traces
User Analytics: Track usage patterns, document processing volumes, and search performance per user

Distributed Tracing

The service implements Distributed Tracing, allowing you to:

Track requests across service boundaries
Maintain request context through the entire call chain
Correlate logs and metrics across multiple services
Debug complex distributed systems

The service supports distributed tracing to maintain request context across service boundaries:

Trace Headers

The service supports multiple trace header formats for maximum compatibility:

x-trace-id: Primary trace ID header
x-b3-traceid: B3 format (Zipkin compatibility)
traceparent: OpenTelemetry format
x-span-id: Span ID for nested operations
x-parent-trace-id: Parent trace context

Client Library with Distributed Tracing

The TypeScript client library includes built-in support for distributed tracing:

import { VectorsGatewayClient, TraceUtils } from '@url4irl/vectors-gateway';

// Create client with trace headers
const traceId = TraceUtils.generateTraceId();
const client = new VectorsGatewayClient(apiKey, baseUrl, { traceId });

// All operations will be linked in the same trace
const results = await client.searchDocuments("machine learning", 1, 1);
const stored = await client.storeDocument("AI content", 1, 1, 123);

// Create child spans for specific operations
const childClient = client.withTraceHeaders(
  TraceUtils.createChildSpanHeaders(traceId)
);
await childClient.deleteDocument(123, 1, 1);

All operations will appear as linked spans in Langfuse, providing complete visibility into your vector operations.

Development, Contributing and Deployment

pnpm install
pnpm dev
# Service will run on http://localhost:4000

This starts the Express app and some Docker services (see dev/docker-compose.yml).

OpenAPI is served by lib/docs.ts from openapi.json. Update the JSON file when changing endpoints.

You'll need a running LiteLLM instance (with embeddings support), Qdrant and a Postgres database. The provided Docker Compose file for local development includes a PostgreSQL database and Qdrant instance.

API

Swagger UI is available at /docs when service is running. OpenAPI spec: openapi.json.

Environment Variables

PORT (default: 4000)
NODE_ENV (default: development)
API_KEY (required) - API key for authentication
LITELLM_BASE_URL (e.g., http://localhost:4000 for your LiteLLM proxy)
LITELLM_API_KEY (you must generate an API key from your LiteLLM instance)
QDRANT_URL (default: http://localhost:6333)
QDRANT_API_KEY (optional)
QDRANT_COLLECTION_NAME (default: documents)
DEFAULT_EMBEDDING_MODEL (default: openai/bge-m3:latest. Note, this is not an OpenAI model, it's a model from BAAI. It is prefixed with openai/ to inform LiteLLM to use the OpenAI API format (via Ollama).)
LANGFUSE_PUBLIC_KEY (required) - Your Langfuse public key
LANGFUSE_SECRET_KEY (required) - Your Langfuse secret key
LANGFUSE_BASE_URL (required) - Langfuse instance URL

Database Management

Database Migration Lifecycle

Production Environment

Database migrations are managed using Drizzle ORM. In a production environment, migrations must be applied manually by accessing the running container and executing the following command within it:

pnpm drizzle-kit migrate --config ./dist/drizzle.config.js

This command will apply any pending schema changes to the database. Ensure you run this command after any deployment that includes database schema modifications.

Development Environment

In development, create and apply migrations using:

pnpm run db:generate # Generates a new migration file
pnpm run db:migrate # Applies the migration to the database

Deployment

When code changes are pushed to the repository, the container is rebuilt and the updated service is deployed.

Use the Dockerfile to deploy this service to wherever you want.

All the environment variables are documented in the Environment Variables section are required.

This service needs a running LiteLLM instance, Qdrant and a Postgres database.

Contributions are always welcome ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
dev		dev
lib		lib
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
drizzle.config.ts		drizzle.config.ts
jest.config.js		jest.config.js
openapi.json		openapi.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vectors Gateway

What is this service?

Key Characteristics

Use Cases

Deployment Pattern

Integration Methods

Option 1: TypeScript/JavaScript Client Library

Option 2: Direct API Integration

Client Library Features

How It Works

Observability & Distributed Tracing

Langfuse Integration

Distributed Tracing

Trace Headers

Client Library with Distributed Tracing

Development, Contributing and Deployment

API

Environment Variables

Database Management

Database Migration Lifecycle

Production Environment

Development Environment

Deployment

About

Uh oh!

Releases 6

Contributors 2

Uh oh!

Languages

url4irl/vectors-gateway

Folders and files

Latest commit

History

Repository files navigation

Vectors Gateway

What is this service?

Key Characteristics

Use Cases

Deployment Pattern

Integration Methods

Option 1: TypeScript/JavaScript Client Library

Option 2: Direct API Integration

Client Library Features

How It Works

Observability & Distributed Tracing

Langfuse Integration

Distributed Tracing

Trace Headers

Client Library with Distributed Tracing

Development, Contributing and Deployment

API

Environment Variables

Database Management

Database Migration Lifecycle

Production Environment

Development Environment

Deployment

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors 2

Uh oh!

Languages