Skip to content

Latest commit

 

History

History
1045 lines (842 loc) · 48 KB

File metadata and controls

1045 lines (842 loc) · 48 KB

Technical Setup Guide

This guide is a technical reference companion to What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. It walks through one possible production architecture for self-hosting Open WebUI, along with configuration examples that organizations in regulated industries may find relevant. This is a starting point for evaluation, not a prescriptive deployment guide - your organization's engineering, quality, and compliance teams should adapt this architecture to your specific requirements.


Table of Contents

  1. Architecture Overview
  2. Pre-Requisites
  3. Docker Compose Reference
  4. Setup Script
  5. Environment Variable Reference
  6. Technical Controls Reference
  7. RBAC Configuration Guide
  8. Knowledge Base Setup Guide
  9. Security Hardening Checklist
  10. Backup & Disaster Recovery

Architecture Overview

The production stack is identical to any Open WebUI enterprise deployment: reverse proxy with TLS, stateless application nodes, PostgreSQL + PGVector for data and vector search, Redis for session coordination, local inference via Ollama and vLLM, and Open Terminal for sandboxed code execution. See the blog post for the architecture diagram and rationale.

This deployment also includes the Inline Visualizer tool and skill, which renders interactive HTML/SVG visualizations directly in the chat. When combined with Open Terminal for computational work, this gives scientists two complementary paths to visual output: Open Terminal for publication-quality figures generated by matplotlib, RDKit, and other scientific Python libraries, and Inline Visualizer for interactive diagrams, flowcharts, and explorable visuals rendered natively in the conversation.

What makes this configuration different from a generic deployment isn't the infrastructure - it's the configuration layer on top: how access is structured, how knowledge bases map to functional groups, and how audit records are retained. The settings below are examples of how organizations in regulated industries have approached these decisions.

Example configuration decisions:

  • ENABLE_ADMIN_CHAT_ACCESS=False - Restricts IT administrators from viewing user conversation content at the application level.
  • USER_PERMISSIONS_CHAT_DELETE=False + USER_PERMISSIONS_CHAT_TEMPORARY=False - Disables chat deletion and temporary chats at the application level, so AI interactions are persisted with timestamps.
  • ENABLE_COMMUNITY_SHARING=False - Disables sharing of data, prompts, or model configurations externally.
  • BYPASS_MODEL_ACCESS_CONTROL=False - Enforces functional group boundaries. A CMC scientist sees manufacturing models and documents; a PV officer sees pharmacovigilance resources. This helps prevent cross-group data exposure.

Pre-Requisites

Hardware Requirements

For an organization with 500–10,000+ employees and concurrent usage of ~100–500 users:

Component Minimum Recommended
Open WebUI nodes 2× (4 vCPU, 8 GB RAM each) 3× (8 vCPU, 16 GB RAM each)
PostgreSQL 4 vCPU, 16 GB RAM, 500 GB SSD 8 vCPU, 32 GB RAM, 1 TB NVMe
Redis 2 vCPU, 4 GB RAM 2 vCPU, 8 GB RAM (Sentinel: 3 nodes)
Ollama (small models, ≤13B) 1× NVIDIA GPU (24 GB VRAM, e.g., RTX 4090) 2× GPUs behind load balancer
vLLM (large models, 70B+) 2× NVIDIA A100 80 GB (tensor parallel) 4× A100 80 GB or 2× H100
Shared storage 1 TB S3-compatible or NFS 5 TB+ with lifecycle policies

Software Requirements

  • Docker Engine ≥ 24.0 and Docker Compose ≥ 2.20
  • NVIDIA Container Toolkit (for GPU nodes) - installation guide
  • TLS certificates - from your organization's internal CA or Let's Encrypt
  • LDAP / SSO credentials - for OAuth/OIDC integration (Okta, Azure AD, Ping Identity, etc.)
  • DNS entry - e.g., ai.yourcompany.com pointing to the reverse proxy

Network Requirements

  • All services communicate on an internal Docker network - no public exposure except the reverse proxy
  • Outbound internet access is not required if models are pre-pulled (fully air-gappable)
  • Ports: only 443 (HTTPS) exposed externally

Docker Compose Reference

Save this as docker-compose.yml in your deployment directory. An accompanying .env file is generated by the setup script below.

# =============================================================================
# Open WebUI - Production Stack
# =============================================================================
# Usage:
#   1. Run ./setup.sh to generate .env and required directories
#   2. docker compose up -d
#   3. Access via https://ai.yourcompany.com
# =============================================================================

services:
  # ---------------------------------------------------------------------------
  # Reverse Proxy - TLS termination and load balancing
  # ---------------------------------------------------------------------------
  nginx:
    image: nginx:alpine
    container_name: owui-proxy
    restart: unless-stopped
    ports:
      - "443:443"
      - "80:80"       # Redirect to HTTPS
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/certs:/etc/nginx/certs:ro
    depends_on:
      open-webui-1:
        condition: service_healthy
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Open WebUI - Stateless application nodes
  # ---------------------------------------------------------------------------
  open-webui-1:
    image: ghcr.io/open-webui/open-webui:0.6  # Pin to a specific version for production environments
    container_name: owui-node-1
    restart: unless-stopped
    environment:
      # --- Core ---
      - WEBUI_URL=${WEBUI_URL}
      - WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - PORT=8080

      # --- Database ---
      - DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}

      # --- Vector DB (PGVector, same PostgreSQL instance) ---
      - VECTOR_DB=pgvector
      - PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}

      # --- Redis ---
      - REDIS_URL=redis://redis:6379/0
      - WEBSOCKET_MANAGER=redis
      - WEBSOCKET_REDIS_URL=redis://redis:6379/0
      - ENABLE_WEBSOCKET_SUPPORT=True

      # --- Inference backends ---
      - OLLAMA_BASE_URL=http://ollama:11434
      - ENABLE_OLLAMA_API=True
      - OPENAI_API_BASE_URL=http://vllm:8000/v1
      - OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
      - ENABLE_OPENAI_API=True

      # --- Security defaults ---
      - ENABLE_SIGNUP=False
      - DEFAULT_USER_ROLE=pending
      - ENABLE_ADMIN_CHAT_ACCESS=False
      - ENABLE_ADMIN_EXPORT=False
      - BYPASS_MODEL_ACCESS_CONTROL=False
      - BYPASS_ADMIN_ACCESS_CONTROL=False
      - ENABLE_COMMUNITY_SHARING=False

      # --- User permissions ---
      - USER_PERMISSIONS_CHAT_DELETE=False
      - USER_PERMISSIONS_CHAT_TEMPORARY=False

      # --- RAG tuning ---
      - RAG_TOP_K=5
      - RAG_SYSTEM_CONTEXT=True
      - ENABLE_RAG_HYBRID_SEARCH=True

      # --- Admin provisioning (first startup only) ---
      - WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
      - WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}

      # --- Workers ---
      - UVICORN_WORKERS=${UVICORN_WORKERS:-4}
      - ENABLE_DB_MIGRATIONS=True  # Only on node-1; set False on others

      # --- Observability (optional) ---
      - ENABLE_OTEL=${ENABLE_OTEL:-False}
      - OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}

      # --- Open Terminal integration ---
      - TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}

      # --- Persistent config ---
      - ENABLE_PERSISTENT_CONFIG=True
    volumes:
      - owui-data:/app/backend/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - owui-net

  # Note: open-webui-2 duplicates the environment from open-webui-1 because
  # Docker Compose list-style environment blocks do not support YAML merge keys.
  # If you add or change a variable above, update it here as well.
  open-webui-2:
    image: ghcr.io/open-webui/open-webui:0.6  # Pin to a specific version for production environments
    container_name: owui-node-2
    restart: unless-stopped
    environment:
      - WEBUI_URL=${WEBUI_URL}
      - WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - PORT=8080
      - DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      - VECTOR_DB=pgvector
      - PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      - REDIS_URL=redis://redis:6379/0
      - WEBSOCKET_MANAGER=redis
      - WEBSOCKET_REDIS_URL=redis://redis:6379/0
      - ENABLE_WEBSOCKET_SUPPORT=True
      - OLLAMA_BASE_URL=http://ollama:11434
      - ENABLE_OLLAMA_API=True
      - OPENAI_API_BASE_URL=http://vllm:8000/v1
      - OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
      - ENABLE_OPENAI_API=True
      - ENABLE_SIGNUP=False
      - DEFAULT_USER_ROLE=pending
      - ENABLE_ADMIN_CHAT_ACCESS=False
      - ENABLE_ADMIN_EXPORT=False
      - BYPASS_MODEL_ACCESS_CONTROL=False
      - BYPASS_ADMIN_ACCESS_CONTROL=False
      - ENABLE_COMMUNITY_SHARING=False
      - USER_PERMISSIONS_CHAT_DELETE=False
      - USER_PERMISSIONS_CHAT_TEMPORARY=False
      - RAG_TOP_K=5
      - RAG_SYSTEM_CONTEXT=True
      - ENABLE_RAG_HYBRID_SEARCH=True
      - WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
      - WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}
      - UVICORN_WORKERS=${UVICORN_WORKERS:-4}
      - ENABLE_DB_MIGRATIONS=False  # Node-1 handles migrations
      - ENABLE_OTEL=${ENABLE_OTEL:-False}
      - OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}
      - TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}
      - ENABLE_PERSISTENT_CONFIG=True
    volumes:
      - owui-data:/app/backend/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # PostgreSQL 16 + PGVector - Database and vector store
  # ---------------------------------------------------------------------------
  postgres:
    image: pgvector/pgvector:pg16
    container_name: owui-postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    networks:
      - owui-net
    # Recommended: tune postgresql.conf for your hardware
    command: >
      postgres
        -c shared_buffers=2GB
        -c effective_cache_size=6GB
        -c work_mem=64MB
        -c maintenance_work_mem=512MB
        -c max_connections=200
        -c wal_level=replica
        -c max_wal_senders=3

  # ---------------------------------------------------------------------------
  # Redis - Session management and WebSocket coordination
  # ---------------------------------------------------------------------------
  redis:
    image: redis:7-alpine
    container_name: owui-redis
    restart: unless-stopped
    command: >
      redis-server
        --maxmemory 2gb
        --maxmemory-policy allkeys-lru
        --maxclients 10000
        --timeout 1800
        --save 60 1000
        --appendonly yes
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Ollama - Local model inference (smaller models, ≤13B)
  # ---------------------------------------------------------------------------
  ollama:
    image: ollama/ollama:latest
    container_name: owui-ollama
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # vLLM - GPU-optimized inference (large models, 70B+)
  # ---------------------------------------------------------------------------
  vllm:
    image: vllm/vllm-openai:latest
    container_name: owui-vllm
    restart: unless-stopped
    command: >
      --model ${VLLM_MODEL:-meta-llama/Llama-3.1-70B-Instruct}
      --tensor-parallel-size ${VLLM_TP_SIZE:-2}
      --max-model-len ${VLLM_MAX_MODEL_LEN:-8192}
      --gpu-memory-utilization 0.90
      --enforce-eager
      --api-key ${VLLM_API_KEY:-sk-none}
    environment:
      - HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${VLLM_TP_SIZE:-2}
              capabilities: [gpu]
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Open Terminal - Sandboxed code execution for scientists
  # ---------------------------------------------------------------------------
  open-terminal:
    image: ghcr.io/open-webui/open-terminal
    container_name: owui-terminal
    restart: unless-stopped
    environment:
      - OPEN_TERMINAL_API_KEY=${OPEN_TERMINAL_API_KEY}
      - OPEN_TERMINAL_PIP_PACKAGES=rdkit-pypi scikit-learn lifelines matplotlib seaborn
      - OPEN_TERMINAL_MAX_SESSIONS=16
    volumes:
      - terminal-data:/home/user
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: "4.0"
    networks:
      - owui-net

# =============================================================================
# Named Volumes
# =============================================================================
volumes:
  owui-data:
    driver: local
  postgres-data:
    driver: local
  redis-data:
    driver: local
  ollama-data:
    driver: local
  terminal-data:
    driver: local

# =============================================================================
# Network
# =============================================================================
networks:
  owui-net:
    driver: bridge

Nginx Configuration

Save this as nginx/nginx.conf:

events {
    worker_connections 1024;
}

http {
    upstream openwebui {
        least_conn;
        server open-webui-1:8080;
        server open-webui-2:8080;
    }

    # Redirect HTTP to HTTPS
    server {
        listen 80;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl;
        server_name ai.yourcompany.com;

        ssl_certificate     /etc/nginx/certs/fullchain.pem;
        ssl_certificate_key /etc/nginx/certs/privkey.pem;
        ssl_protocols       TLSv1.2 TLSv1.3;
        ssl_ciphers         HIGH:!aNULL:!MD5;

        # Security headers
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
        add_header X-Content-Type-Options "nosniff" always;
        add_header X-Frame-Options "SAMEORIGIN" always;
        add_header Referrer-Policy "strict-origin-when-cross-origin" always;

        # Max upload size for document ingestion
        client_max_body_size 100M;

        location / {
            proxy_pass http://openwebui;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # WebSocket support (required for streaming responses)
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            # Timeouts for long-running LLM responses
            proxy_read_timeout 300s;
            proxy_send_timeout 300s;
        }
    }
}

Setup Script

Save this as setup.sh and run it before your first docker compose up:

#!/usr/bin/env bash
# =============================================================================
# Open WebUI - Production Setup Script
# =============================================================================
# This script creates the required directory structure, generates secrets,
# pulls initial models, and validates the environment before first boot.
#
# Usage: chmod +x setup.sh && ./setup.sh
# =============================================================================

set -euo pipefail

# --- Colors -----------------------------------------------------------------
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

info()  { echo -e "${GREEN}[INFO]${NC}  $1"; }
warn()  { echo -e "${YELLOW}[WARN]${NC}  $1"; }
error() { echo -e "${RED}[ERROR]${NC} $1"; exit 1; }

# --- Pre-flight checks ------------------------------------------------------
info "Running pre-flight checks..."

command -v docker >/dev/null 2>&1 || error "Docker is not installed."
command -v docker compose >/dev/null 2>&1 || error "Docker Compose v2 is not installed."

DOCKER_VERSION=$(docker version --format '{{.Server.Version}}' 2>/dev/null)
info "Docker version: ${DOCKER_VERSION}"

# Check for NVIDIA GPU (optional)
if command -v nvidia-smi >/dev/null 2>&1; then
    GPU_INFO=$(nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "GPU detected but nvidia-smi query failed")
    info "GPU detected: ${GPU_INFO}"
else
    warn "No NVIDIA GPU detected. Ollama will run on CPU (slower inference)."
    warn "vLLM requires a GPU and will not start without one."
fi

# --- Create directory structure ---------------------------------------------
info "Creating directory structure..."

mkdir -p nginx/certs
mkdir -p data/ollama
mkdir -p data/postgres
mkdir -p data/redis
mkdir -p data/open-webui
mkdir -p backups

# --- Generate secrets -------------------------------------------------------
info "Generating secrets..."

generate_secret() {
    openssl rand -base64 32 | tr -d '/+=' | head -c 48
}

if [ ! -f .env ]; then
    TERMINAL_KEY=$(generate_secret)
    cat > .env << EOF
# =============================================================================
# Open WebUI - Environment Configuration
# Generated on $(date -u +"%Y-%m-%dT%H:%M:%SZ")
# =============================================================================

# --- Public URL ---
WEBUI_URL=https://ai.yourcompany.com
WEBUI_NAME=AI Assistant

# --- Secret key (used for JWT signing - KEEP THIS SECRET) ---
WEBUI_SECRET_KEY=$(generate_secret)

# --- Admin account (created on first startup) ---
ADMIN_EMAIL=admin@yourcompany.com
ADMIN_PASSWORD=$(generate_secret)
ADMIN_NAME=IT Admin

# --- PostgreSQL ---
POSTGRES_USER=openwebui
POSTGRES_PASSWORD=$(generate_secret)
POSTGRES_DB=openwebui

# --- vLLM ---
VLLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
VLLM_TP_SIZE=2
VLLM_MAX_MODEL_LEN=8192
VLLM_API_KEY=$(generate_secret)
HF_TOKEN=hf_your_token_here

# --- Open Terminal ---
OPEN_TERMINAL_API_KEY=${TERMINAL_KEY}
TERMINAL_SERVER_CONNECTIONS='[{\"url\":\"http://open-terminal:8000\",\"key\":\"${TERMINAL_KEY}\"}]'

# --- Workers ---
UVICORN_WORKERS=4

# --- Observability (optional) ---
ENABLE_OTEL=False
OTEL_ENDPOINT=

# =============================================================================
# IMPORTANT: Update the following before deploying:
#   1. WEBUI_URL - your actual domain
#   2. ADMIN_EMAIL / ADMIN_PASSWORD - your admin credentials
#   3. HF_TOKEN - your Hugging Face token (for gated models like Llama)
#   4. Place TLS certs in ./nginx/certs/ (fullchain.pem + privkey.pem)
# =============================================================================
EOF
    info ".env file created. Review and update it before starting."
    warn "Generated admin password is in .env - save it securely."
else
    warn ".env already exists. Skipping generation."
fi

# --- Pull Ollama models -----------------------------------------------------
info "Pulling recommended Ollama models..."
info "(This may take a while on first run.)"

# Start Ollama temporarily to pull models
if docker compose ps ollama 2>/dev/null | grep -q "running"; then
    info "Ollama is already running."
else
    info "Starting Ollama service to pull models..."
    docker compose up -d ollama
    sleep 10  # Wait for Ollama to initialize
fi

# Pull models (adjust to your organization's needs)
MODELS=(
    "llama3.1:8b"       # Fast - summarization, Q&A, literature triage
    "nomic-embed-text"  # Embedding model for RAG
)

for model in "${MODELS[@]}"; do
    info "Pulling ${model}..."
    docker compose exec ollama ollama pull "${model}" || warn "Failed to pull ${model}. You can pull it later."
done

info "Models pulled. You can add more models later via:"
info "  docker compose exec ollama ollama pull <model-name>"

# --- Validate Docker Compose ------------------------------------------------
info "Validating Docker Compose configuration..."
docker compose config --quiet && info "Docker Compose configuration is valid." || error "Docker Compose validation failed."

# --- Summary ----------------------------------------------------------------
echo ""
echo "============================================================================="
echo "  Setup complete!"
echo "============================================================================="
echo ""
echo "  Next steps:"
echo "    1. Edit .env with your domain, admin credentials, and HF token"
echo "    2. Place TLS certificates in ./nginx/certs/"
echo "       - fullchain.pem (certificate chain)"
echo "       - privkey.pem   (private key)"
echo "    3. Update nginx/nginx.conf server_name to match your domain"
echo "    4. Start the stack:  docker compose up -d"
echo "    5. Access the UI at: https://ai.yourcompany.com"
echo ""
echo "  To check service health:  docker compose ps"
echo "  To view logs:             docker compose logs -f open-webui-1"
echo "  To pull more models:      docker compose exec ollama ollama pull <model>"
echo ""
echo "============================================================================="

Environment Variable Reference

These are the same Open WebUI environment variables used in any deployment. This section highlights the ones that organizations in regulated industries may find relevant. These descriptions explain what each setting does - they do not constitute compliance guidance. For the full reference, see the Open WebUI documentation.

Data Retention & Visibility

Note: These descriptions explain the technical behavior of each setting. They do not constitute a compliance determination. Your organization's quality team must evaluate these controls as part of your own Computer System Validation (CSV) process.

Variable Value What This Setting Does
USER_PERMISSIONS_CHAT_DELETE False Disables deletion of conversation records at the application level.
USER_PERMISSIONS_CHAT_TEMPORARY False Disables temporary chats, so AI interactions are recorded.
ENABLE_ADMIN_CHAT_ACCESS False Restricts IT administrators from viewing user conversation content at the application level.
ENABLE_ADMIN_EXPORT False Disables bulk extraction of conversation records at the application level.

Access Control

Variable Value Rationale
ENABLE_SIGNUP False All users provisioned via SSO or admin. No uncontrolled account creation.
DEFAULT_USER_ROLE pending New SSO users require explicit admin approval before accessing any AI capabilities.
BYPASS_MODEL_ACCESS_CONTROL False Enforces RBAC model restrictions - users only see models assigned to their functional group.
BYPASS_ADMIN_ACCESS_CONTROL False Admins are subject to the same workspace access rules as regular users.
ENABLE_COMMUNITY_SHARING False No data, prompts, or model configurations shared to external community hubs.

RAG & Retrieval

Variable Value Rationale
VECTOR_DB pgvector Uses PostgreSQL's PGVector extension - one database for both application data and vector search.
RAG_TOP_K 5 Returns the top 5 most relevant document chunks. Tune based on document density.
ENABLE_RAG_HYBRID_SEARCH True BM25 + vector ensemble search with reranking. Recommended for scientific documents where exact terminology matters alongside semantic similarity.

Infrastructure

Variable Value Rationale
DATABASE_URL postgresql://... Required for multi-node. SQLite cannot handle concurrent writes.
REDIS_URL redis://redis:6379/0 Session coordination across stateless nodes.
WEBSOCKET_MANAGER redis Routes streaming responses through Redis for multi-node consistency.
ENABLE_DB_MIGRATIONS True (node-1 only) Only one node should run migrations on startup to prevent race conditions.
TERMINAL_SERVER_CONNECTIONS JSON array Pre-configures the Open Terminal connection so it's available to all users on startup. Format: [{"url":"http://open-terminal:8000","key":"<API_KEY>"}]. Can also be configured manually in Admin Settings → Integrations.

Redis note: The timeout 1800 setting in the Docker Compose Redis config is critical. Without it, idle connections accumulate until maxclients is exhausted and all logins fail. See the Open WebUI Redis documentation.

Open Terminal

Variable Value Rationale
OPEN_TERMINAL_API_KEY Generated secret Bearer API key for authenticating requests from Open WebUI to the terminal container.
OPEN_TERMINAL_PIP_PACKAGES rdkit-pypi scikit-learn lifelines matplotlib seaborn Pre-installs scientific Python libraries at container startup. Scientists can install additional packages at runtime.
OPEN_TERMINAL_MAX_SESSIONS 16 Maximum concurrent interactive terminal sessions. Prevents resource exhaustion.

Technical Controls Reference

Important

Open WebUI is a general-purpose AI platform. It is not a validated GxP system, and nothing in this guide should be interpreted as a compliance determination. When deployed with the configuration described here, Open WebUI provides technical controls that your organization's quality team can evaluate as part of their own Computer System Validation (CSV) effort. The mappings below are informational - your validation team must independently verify that each control meets your regulatory obligations.

All AI-generated content is unvalidated and must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context.

Technical Controls Available (Validation Required)

The following table lists technical capabilities that Open WebUI provides when configured as described in this guide. These are not compliance claims. Your validation team must independently determine whether these controls satisfy your specific regulatory obligations.

Area Technical Controls
Audit trail Conversations are timestamped and attributed to an authenticated user. Chat deletion can be disabled at the application level via USER_PERMISSIONS_CHAT_DELETE=False.
System access SSO/OIDC integration, role-based access control, DEFAULT_USER_ROLE=pending for approval workflow.
Authorization RBAC restricts access to models, documents, and features by functional group.
Availability monitoring Health checks, OpenTelemetry integration, and Redis session management support monitoring.
User identity SSO provides authenticated identity. The platform authenticates individual user accounts via SSO/OIDC.
Deployment model Can be deployed on internal infrastructure with no external dependencies when models are pre-loaded.
Data integrity Chat deletion can be disabled at the application level. PostgreSQL WAL for write-ahead logging. Automated backups.
Data migration PostgreSQL pg_dump/pg_restore with integrity verification. Standard, well-documented process.
Business continuity Stateless nodes with automatic failover, Redis HA, PostgreSQL WAL archiving for point-in-time recovery.

What This Might Mean for a Validation Team

If your organization uses a risk-based approach to CSV (Computer System Validation), the GAMP categorization, validation scope, and testing depth are decisions your validation team must make based on your specific deployment, customizations, and intended use. Open WebUI's open-source codebase and container-based deployment with version-pinned images may facilitate aspects of your validation process, but the validation strategy itself is an organizational responsibility.


RBAC Configuration Guide

After first deployment, you can configure functional groups via the Admin Panel. The following is an example workflow - your organization should design its own group structure based on its functional areas, risk profile, and governance requirements.

Step 1: Configure OAuth / SSO

Navigate to Admin Panel → Settings → OAuth and configure your identity provider:

OAuth configuration settings

OPENID_PROVIDER_URL=https://login.yourcompany.com/.well-known/openid-configuration
OAUTH_CLIENT_ID=<your-client-id>
OAUTH_CLIENT_SECRET=<your-client-secret>
OAUTH_SCOPES=openid email profile groups
OAUTH_GROUP_CLAIM=groups
ENABLE_OAUTH_GROUP_MANAGEMENT=True
ENABLE_OAUTH_GROUP_CREATION=True
ENABLE_OAUTH_ROLE_MANAGEMENT=True

Tip: Set ENABLE_OAUTH_GROUP_MANAGEMENT=True so that functional group membership syncs automatically from your identity provider. When a scientist transfers from R&D to Medical Affairs in your directory, their Open WebUI permissions update on next login - no manual reprovisioning.

Step 2: Create Functional Groups

Navigate to Admin Panel → Groups and create groups matching your organization's structure:

Groups management page

  1. Biostatistics

    • Models: All available models
    • Knowledge bases: Analysis datasets, statistical analysis plans, CDISC standards libraries
    • Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled
    • Rationale: Biostatisticians run survival analyses, enrollment dashboards, and forest plots. Open Terminal gives them a computational environment without requiring IT tickets.
  2. Clinical Development

    • Models: All available models
    • Knowledge bases: Study protocols, investigator brochures, CRF templates, monitoring plan libraries
    • Permissions: Document extraction enabled, file upload enabled, web search enabled
    • Rationale: Clinical teams work with both internal protocols and public clinical trial registries. Web search enables ClinicalTrials.gov lookups. Document extraction helps process regulatory correspondence.
  3. Manufacturing / CMC

    • Models: All available models
    • Knowledge bases: Batch records, process validation reports, equipment SOPs
    • Permissions: Open Terminal enabled, file upload enabled
    • Rationale: CMC scientists frequently upload batch records and deviation reports for AI-assisted root cause analysis. Open Terminal enables batch trend analysis and process parameter visualization.
  4. Medical Affairs

    • Models: All available models
    • Knowledge bases: Product monographs, congress abstracts, KOL slide decks
    • Permissions: Web search enabled, file upload enabled
    • Rationale: Medical Affairs teams need access to public literature and congress proceedings alongside internal medical information.
  5. Pharmacovigilance

    • Models: Reasoning models only (e.g., Llama 3.1 70B via vLLM)
    • Knowledge bases: MedDRA dictionaries, CIOMS forms, signal detection SOPs
    • Permissions: RAG-only mode (no web search, no file upload)
    • Rationale: PV work is safety-critical. Restricting to RAG-only mode prioritizes retrieval from curated internal documents and disables web search, reducing exposure to uncontrolled external content. The underlying model may still draw on its training data.
  6. R&D / Discovery

    • Models: All available models
    • Knowledge bases: Compound libraries, assay protocols, literature databases
    • Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled, web search enabled
    • Rationale: Discovery scientists need the broadest toolset - running SAR analyses and molecular modeling in Open Terminal, uploading proprietary assay results, and searching public literature.
  7. Regulatory Affairs

    • Models: All available models
    • Knowledge bases: eCTD templates, FDA/EMA guidance, precedent correspondence
    • Permissions: Document extraction enabled, file upload enabled
    • Rationale: Regulatory scientists frequently need to extract structured data from FDA letters, EMA assessment reports, and deficiency notices.
  8. Support Staff

    • Models: Small models only (e.g., Llama 3.1 8B via Ollama)
    • Knowledge bases: Company policies, HR procedures, training materials
    • Permissions: No file upload, no web search, no terminal access
    • Rationale: Minimal access footprint for non-scientific users.

Step 3: Assign Models to Groups

For each model in Admin Panel → Models:

Model access control settings

  1. Set visibility to Private (not Public)
  2. Under Access Control, add the groups that should have access
  3. Ensure BYPASS_MODEL_ACCESS_CONTROL=False in your environment so these restrictions are enforced

Step 4: Assign Knowledge Bases to Groups

For each knowledge base in Admin Panel → Knowledge:

Knowledge base access control

  1. Set access control to the relevant functional groups
  2. Users will only see knowledge bases assigned to their group(s) in the chat interface

Step 5: Install the Inline Visualizer Tool & Skill

The Inline Visualizer plugin renders interactive HTML/SVG visualizations directly in the chat. It includes a theme-aware design system with color ramps, SVG utility classes, and a communication bridge that lets visualizations send prompts back to the chat for conversational exploration.

This plugin has two components:

Component File Install Location
Tool tool.py Workspace → Tools
Skill SKILL.md Workspace → Knowledge → Create Skill

Install the Tool:

  1. Copy the contents of tool.py
  2. In Open WebUI, go to Workspace → Tools → + Create New
  3. Paste the code and click Save

Install the Skill:

  1. Copy the contents of SKILL.md
  2. In Open WebUI, go to Workspace → Knowledge → + Create Skill
  3. Name it visualize (this exact name is required)
  4. Paste the contents and click Save

Attach to Models:

  1. Go to Admin Panel → Models and edit each model that should support visualizations
  2. Under Tools, enable the Inline Visualizer tool
  3. Under Skills, attach the visualize skill
  4. Ensure native function calling is enabled for the model
  5. Save

Enable Interactive Features (Optional):

  1. Go to Settings → Interface
  2. Enable iframe Sandbox Allow Same Origin

Without this, visualizations render normally but interactive buttons that send prompts back to the chat (sendPrompt) will not work.

Tip: A strong model is required for complex, visually detailed interactive visualizations. Tested with Claude Haiku 4.5 and Claude Opus 4.5.


Knowledge Base Setup Guide

Open WebUI's RAG system ingests documents and creates searchable vector embeddings in PGVector. This section provides an example knowledge base design for pharmaceutical contexts.

Recommended Knowledge Base Structure

Knowledge Base Contents Functional Groups Notes
Compound Library Structures, SAR data, screening results, MoA summaries R&D / Discovery High sensitivity - restrict strictly to R&D
Assay Protocols Standard assay procedures, validation data, reference standards R&D / Discovery
Clinical Protocols Study protocols, ICH E6/E8/E9 references, SAPs Clinical Development
CRF Templates Case report forms, data management plans, reconciliation guides Clinical Development
Statistical Methods SAPs, CDISC standards, analysis dataset specifications Biostatistics
Regulatory Guidance FDA guidance library, EMA guidelines, ICH harmonized guidelines Regulatory Affairs Consider splitting by region (FDA/EMA/PMDA)
Submission Templates eCTD module templates, cover letters, precedent review correspondence Regulatory Affairs
PV Reference MedDRA hierarchy, CIOMS forms, signal detection SOPs, PSUR templates Pharmacovigilance Reviewed documents only - no drafts
Manufacturing SOPs Batch records, process validation reports, equipment qualification docs Manufacturing / CMC
Medical Information Product monographs, SmPCs, congress posters, medical response letters Medical Affairs
Company Policies HR handbook, compliance policies, IT security procedures, training guides All groups

Upload Workflow

  1. Navigate to Workspace → Knowledge → Create Knowledge Base

Knowledge base upload workflow 2. Name the knowledge base and set the access control to the relevant functional group(s) 3. Upload documents (supported formats: PDF, DOCX, TXT, Markdown, HTML, CSV, XLSX, PPTX) 4. Open WebUI automatically:

  • Extracts text from uploaded documents
  • Chunks the content for optimal retrieval
  • Generates vector embeddings and stores them in PGVector
  1. Users in the assigned groups can now reference this knowledge base in chat by typing # followed by the knowledge base name

RAG Best Practices

  • Regulatory submissions: Large eCTD modules should be split by section (e.g., upload Module 2.5 Quality Overall Summary separately from Module 3.2.P Drug Product). This improves retrieval precision significantly.
  • SOPs and batch records: These are typically well-structured documents that RAG handles effectively. Use descriptive filenames that include the SOP number and revision (e.g., SOP-MFG-042-Rev3-Tablet-Compression.pdf).
  • Literature databases: For large literature collections (1,000+ papers), consider organizing into topic-specific knowledge bases rather than one monolithic collection. This lets users target their retrieval.
  • Citation verification: RAG provides relevance scores with each retrieved chunk. Scientists must always verify citations against the source - RAG reduces hallucination but does not eliminate it. All AI-generated content must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context. This is especially critical for PV and regulatory use cases.
  • Version control: When SOPs are revised or guidance documents update, upload the new version and remove the old one. Knowledge bases can be updated without downtime. Maintain a document version log outside Open WebUI for your QMS.

Embedding Model Selection

The Docker Compose stack pulls nomic-embed-text via Ollama for generating embeddings locally. Configure this in Admin Panel → Settings → Documents → Embedding Model.

For higher-quality embeddings (recommended for 10,000+ document deployments), consider using a dedicated embedding endpoint. Set RAG_OPENAI_API_BASE_URL to point to a self-hosted embedding service or use Ollama's built-in embedding support (both options keep data on your infrastructure when configured accordingly).


Security Hardening Checklist

The following checklist describes operational security measures. This is not a compliance checklist - your organization's quality, security, and compliance teams should determine which items apply to your environment and what additional measures are needed.

Network Layer

  • TLS 1.2+ enforced on the reverse proxy - no plaintext HTTP traffic reaches Open WebUI
  • Only port 443 is exposed to the user network; all other services are on internal Docker network
  • HSTS header set with max-age=63072000; includeSubDomains
  • Security headers: X-Content-Type-Options: nosniff, X-Frame-Options: SAMEORIGIN, Referrer-Policy: strict-origin-when-cross-origin
  • Rate limiting configured on the reverse proxy to prevent abuse
  • DNS resolves only to the reverse proxy - no direct access to application nodes

Authentication & Authorization

  • ENABLE_SIGNUP=False - no self-registration
  • DEFAULT_USER_ROLE=pending - new SSO users require admin approval
  • SSO/OIDC configured with your organization's identity provider
  • ENABLE_OAUTH_GROUP_MANAGEMENT=True - groups sync from IdP
  • ENABLE_OAUTH_ROLE_MANAGEMENT=True - roles sync from IdP
  • BYPASS_MODEL_ACCESS_CONTROL=False - RBAC enforced on model access
  • BYPASS_ADMIN_ACCESS_CONTROL=False - admins subject to workspace ACLs

Data Protection

  • ENABLE_ADMIN_CHAT_ACCESS=False - restricts IT administrators from viewing user conversation content at the application level
  • ENABLE_ADMIN_EXPORT=False - disables bulk data extraction at the application level
  • USER_PERMISSIONS_CHAT_DELETE=False - disables chat deletion at the application level
  • USER_PERMISSIONS_CHAT_TEMPORARY=False - no unlogged conversations
  • ENABLE_COMMUNITY_SHARING=False - no external data sharing
  • PostgreSQL configured with encryption at rest (transparent data encryption or full-disk encryption on the host)
  • Redis requirepass set if Redis is network-accessible (not needed when Redis is internal-only via Docker network)
  • Backup encryption enabled (see Backup & Disaster Recovery)

Model & Inference Security

  • When configured for local-only inference, all models run via Ollama or vLLM on your infrastructure
  • Hugging Face token is stored only in .env, not committed to version control
  • .env file has restrictive permissions: chmod 600 .env
  • For production deployments, consider migrating secrets from .env to a dedicated secrets manager (e.g., HashiCorp Vault, AWS Secrets Manager)
  • vLLM API key (VLLM_API_KEY) is set to prevent unauthorized direct access to the inference endpoint
  • Docker image tags pinned to specific versions (not :main or :latest) for reproducible, auditable deployments
  • If Functions are used: LLM-Guard or equivalent function installed for prompt injection scanning

Open Terminal Security

  • OPEN_TERMINAL_API_KEY is set — without it, anyone who can reach the port has full access
  • Open Terminal container is on the internal Docker network only — not exposed to external traffic
  • Resource limits applied: memory: 4G and cpus: 4.0 (or appropriate for your environment)
  • OPEN_TERMINAL_MAX_SESSIONS=16 to prevent resource exhaustion from concurrent terminal sessions
  • Docker socket is not mounted (/var/run/docker.sock) — unless explicitly required and the environment is trusted
  • Named volume mounted at /home/user for file persistence across container restarts
  • Open Terminal access restricted to appropriate functional groups via Admin Settings → Integrations

Operational Security

  • ENABLE_DB_MIGRATIONS=True on exactly one node; False on all others
  • Redis maxclients set to 10000+ and timeout set to 1800
  • Log aggregation configured (OpenTelemetry, Splunk, Datadog, or equivalent)
  • Alerting on container restarts, database connection failures, and GPU memory exhaustion
  • Docker image tags pinned to specific versions in production (not :main or :latest)
  • Regular security updates for base images (docker compose pull && docker compose up -d)

Backup & Disaster Recovery

What to Back Up

Component Data Location Backup Method
PostgreSQL postgres-data volume pg_dump (logical) or continuous WAL archiving
Redis redis-data volume AOF + RDB snapshots (handled by Redis config)
Ollama models ollama-data volume Volume snapshot or re-pull (models are public)
Open WebUI data owui-data volume Volume snapshot
Open Terminal data terminal-data volume Volume snapshot (or ephemeral — rebuild on demand)
Configuration .env, nginx/, docker-compose.yml Git repository (exclude secrets)
TLS certificates nginx/certs/ Certificate management system

Automated PostgreSQL Backup Script

Add this to your crontab (crontab -e) or scheduling system:

#!/usr/bin/env bash
# Daily PostgreSQL backup - run via cron at 02:00 UTC
# 0 2 * * * /opt/openwebui/backup-postgres.sh

set -euo pipefail

BACKUP_DIR="/opt/openwebui/backups"
RETENTION_DAYS=30
TIMESTAMP=$(date -u +"%Y%m%d_%H%M%S")
BACKUP_FILE="${BACKUP_DIR}/openwebui_${TIMESTAMP}.sql.gz"

mkdir -p "${BACKUP_DIR}"

docker compose exec -T postgres pg_dump \
    -U "${POSTGRES_USER:-openwebui}" \
    -d "${POSTGRES_DB:-openwebui}" \
    --format=custom \
    --compress=9 \
    > "${BACKUP_FILE}"

# Verify backup integrity (pg_restore runs on the host against the host-side file)
pg_restore --list "${BACKUP_FILE}" > /dev/null 2>&1 \
    && echo "[OK] Backup verified: ${BACKUP_FILE}" \
    || echo "[ERROR] Backup verification failed: ${BACKUP_FILE}"

# Prune old backups
find "${BACKUP_DIR}" -name "openwebui_*.sql.gz" -mtime +${RETENTION_DAYS} -delete

echo "[INFO] Backup complete. Size: $(du -h "${BACKUP_FILE}" | cut -f1)"

Recovery Procedure

  1. Stop Open WebUI nodes: docker compose stop open-webui-1 open-webui-2
  2. Restore PostgreSQL: docker compose exec -T postgres pg_restore -U openwebui -d openwebui --clean < backup.sql.gz
  3. Restart: docker compose up -d
  4. Verify: Check health endpoints and run a test query

RPO / RTO Targets

Scenario RPO (Data Loss) RTO (Downtime)
Single node failure 0 (stateless, auto-recovered) < 30 seconds (health check interval)
Database corruption ≤ 24 hours (daily backups) < 1 hour
Full infrastructure loss ≤ 24 hours 2–4 hours (restore from backups)
With WAL archiving ≤ 5 minutes < 1 hour

For mission-critical deployments, enable PostgreSQL WAL archiving for point-in-time recovery with an RPO of minutes rather than hours.


This guide is maintained alongside What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. For questions about enterprise deployment, contact sales@openwebui.com.