Technical Setup Guide

This guide is a technical reference companion to What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. It walks through one possible production architecture for self-hosting Open WebUI, along with configuration examples that organizations in regulated industries may find relevant. This is a starting point for evaluation, not a prescriptive deployment guide - your organization's engineering, quality, and compliance teams should adapt this architecture to your specific requirements.

Architecture Overview
Pre-Requisites
Docker Compose Reference
Setup Script
Environment Variable Reference
Technical Controls Reference
RBAC Configuration Guide
Knowledge Base Setup Guide
Security Hardening Checklist
Backup & Disaster Recovery

Architecture Overview

The production stack is identical to any Open WebUI enterprise deployment: reverse proxy with TLS, stateless application nodes, PostgreSQL + PGVector for data and vector search, Redis for session coordination, local inference via Ollama and vLLM, and Open Terminal for sandboxed code execution. See the blog post for the architecture diagram and rationale.

This deployment also includes the Inline Visualizer tool and skill, which renders interactive HTML/SVG visualizations directly in the chat. When combined with Open Terminal for computational work, this gives scientists two complementary paths to visual output: Open Terminal for publication-quality figures generated by matplotlib, RDKit, and other scientific Python libraries, and Inline Visualizer for interactive diagrams, flowcharts, and explorable visuals rendered natively in the conversation.

What makes this configuration different from a generic deployment isn't the infrastructure - it's the configuration layer on top: how access is structured, how knowledge bases map to functional groups, and how audit records are retained. The settings below are examples of how organizations in regulated industries have approached these decisions.

Example configuration decisions:

ENABLE_ADMIN_CHAT_ACCESS=False - Restricts IT administrators from viewing user conversation content at the application level.
USER_PERMISSIONS_CHAT_DELETE=False + USER_PERMISSIONS_CHAT_TEMPORARY=False - Disables chat deletion and temporary chats at the application level, so AI interactions are persisted with timestamps.
ENABLE_COMMUNITY_SHARING=False - Disables sharing of data, prompts, or model configurations externally.
BYPASS_MODEL_ACCESS_CONTROL=False - Enforces functional group boundaries. A CMC scientist sees manufacturing models and documents; a PV officer sees pharmacovigilance resources. This helps prevent cross-group data exposure.

Pre-Requisites

Hardware Requirements

For an organization with 500–10,000+ employees and concurrent usage of ~100–500 users:

Component	Minimum	Recommended
Open WebUI nodes	2× (4 vCPU, 8 GB RAM each)	3× (8 vCPU, 16 GB RAM each)
PostgreSQL	4 vCPU, 16 GB RAM, 500 GB SSD	8 vCPU, 32 GB RAM, 1 TB NVMe
Redis	2 vCPU, 4 GB RAM	2 vCPU, 8 GB RAM (Sentinel: 3 nodes)
Ollama (small models, ≤13B)	1× NVIDIA GPU (24 GB VRAM, e.g., RTX 4090)	2× GPUs behind load balancer
vLLM (large models, 70B+)	2× NVIDIA A100 80 GB (tensor parallel)	4× A100 80 GB or 2× H100
Shared storage	1 TB S3-compatible or NFS	5 TB+ with lifecycle policies

Software Requirements

Docker Engine ≥ 24.0 and Docker Compose ≥ 2.20
NVIDIA Container Toolkit (for GPU nodes) - installation guide
TLS certificates - from your organization's internal CA or Let's Encrypt
LDAP / SSO credentials - for OAuth/OIDC integration (Okta, Azure AD, Ping Identity, etc.)
DNS entry - e.g., ai.yourcompany.com pointing to the reverse proxy

Network Requirements

All services communicate on an internal Docker network - no public exposure except the reverse proxy
Outbound internet access is not required if models are pre-pulled (fully air-gappable)
Ports: only 443 (HTTPS) exposed externally

Docker Compose Reference

Save this as docker-compose.yml in your deployment directory. An accompanying .env file is generated by the setup script below.

# =============================================================================
# Open WebUI - Production Stack
# =============================================================================
# Usage:
#   1. Run ./setup.sh to generate .env and required directories
#   2. docker compose up -d
#   3. Access via https://ai.yourcompany.com
# =============================================================================

services:
  # ---------------------------------------------------------------------------
  # Reverse Proxy - TLS termination and load balancing
  # ---------------------------------------------------------------------------
  nginx:
    image: nginx:alpine
    container_name: owui-proxy
    restart: unless-stopped
    ports:
      - "443:443"
      - "80:80"       # Redirect to HTTPS
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/certs:/etc/nginx/certs:ro
    depends_on:
      open-webui-1:
        condition: service_healthy
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Open WebUI - Stateless application nodes
  # ---------------------------------------------------------------------------
  open-webui-1:
    image: ghcr.io/open-webui/open-webui:0.6  # Pin to a specific version for production environments
    container_name: owui-node-1
    restart: unless-stopped
    environment:
      # --- Core ---
      - WEBUI_URL=${WEBUI_URL}
      - WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - PORT=8080

      # --- Database ---
      - DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}

      # --- Vector DB (PGVector, same PostgreSQL instance) ---
      - VECTOR_DB=pgvector
      - PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}

      # --- Redis ---
      - REDIS_URL=redis://redis:6379/0
      - WEBSOCKET_MANAGER=redis
      - WEBSOCKET_REDIS_URL=redis://redis:6379/0
      - ENABLE_WEBSOCKET_SUPPORT=True

      # --- Inference backends ---
      - OLLAMA_BASE_URL=http://ollama:11434
      - ENABLE_OLLAMA_API=True
      - OPENAI_API_BASE_URL=http://vllm:8000/v1
      - OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
      - ENABLE_OPENAI_API=True

      # --- Security defaults ---
      - ENABLE_SIGNUP=False
      - DEFAULT_USER_ROLE=pending
      - ENABLE_ADMIN_CHAT_ACCESS=False
      - ENABLE_ADMIN_EXPORT=False
      - BYPASS_MODEL_ACCESS_CONTROL=False
      - BYPASS_ADMIN_ACCESS_CONTROL=False
      - ENABLE_COMMUNITY_SHARING=False

      # --- User permissions ---
      - USER_PERMISSIONS_CHAT_DELETE=False
      - USER_PERMISSIONS_CHAT_TEMPORARY=False

      # --- RAG tuning ---
      - RAG_TOP_K=5
      - RAG_SYSTEM_CONTEXT=True
      - ENABLE_RAG_HYBRID_SEARCH=True

      # --- Admin provisioning (first startup only) ---
      - WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
      - WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}

      # --- Workers ---
      - UVICORN_WORKERS=${UVICORN_WORKERS:-4}
      - ENABLE_DB_MIGRATIONS=True  # Only on node-1; set False on others

      # --- Observability (optional) ---
      - ENABLE_OTEL=${ENABLE_OTEL:-False}
      - OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}

      # --- Open Terminal integration ---
      - TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}

      # --- Persistent config ---
      - ENABLE_PERSISTENT_CONFIG=True
    volumes:
      - owui-data:/app/backend/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - owui-net

  # Note: open-webui-2 duplicates the environment from open-webui-1 because
  # Docker Compose list-style environment blocks do not support YAML merge keys.
  # If you add or change a variable above, update it here as well.
  open-webui-2:
    image: ghcr.io/open-webui/open-webui:0.6  # Pin to a specific version for production environments
    container_name: owui-node-2
    restart: unless-stopped
    environment:
      - WEBUI_URL=${WEBUI_URL}
      - WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - PORT=8080
      - DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      - VECTOR_DB=pgvector
      - PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
      - REDIS_URL=redis://redis:6379/0
      - WEBSOCKET_MANAGER=redis
      - WEBSOCKET_REDIS_URL=redis://redis:6379/0
      - ENABLE_WEBSOCKET_SUPPORT=True
      - OLLAMA_BASE_URL=http://ollama:11434
      - ENABLE_OLLAMA_API=True
      - OPENAI_API_BASE_URL=http://vllm:8000/v1
      - OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
      - ENABLE_OPENAI_API=True
      - ENABLE_SIGNUP=False
      - DEFAULT_USER_ROLE=pending
      - ENABLE_ADMIN_CHAT_ACCESS=False
      - ENABLE_ADMIN_EXPORT=False
      - BYPASS_MODEL_ACCESS_CONTROL=False
      - BYPASS_ADMIN_ACCESS_CONTROL=False
      - ENABLE_COMMUNITY_SHARING=False
      - USER_PERMISSIONS_CHAT_DELETE=False
      - USER_PERMISSIONS_CHAT_TEMPORARY=False
      - RAG_TOP_K=5
      - RAG_SYSTEM_CONTEXT=True
      - ENABLE_RAG_HYBRID_SEARCH=True
      - WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
      - WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
      - WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}
      - UVICORN_WORKERS=${UVICORN_WORKERS:-4}
      - ENABLE_DB_MIGRATIONS=False  # Node-1 handles migrations
      - ENABLE_OTEL=${ENABLE_OTEL:-False}
      - OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}
      - TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}
      - ENABLE_PERSISTENT_CONFIG=True
    volumes:
      - owui-data:/app/backend/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # PostgreSQL 16 + PGVector - Database and vector store
  # ---------------------------------------------------------------------------
  postgres:
    image: pgvector/pgvector:pg16
    container_name: owui-postgres
    restart: unless-stopped
    environment:
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - POSTGRES_DB=${POSTGRES_DB}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    networks:
      - owui-net
    # Recommended: tune postgresql.conf for your hardware
    command: >
      postgres
        -c shared_buffers=2GB
        -c effective_cache_size=6GB
        -c work_mem=64MB
        -c maintenance_work_mem=512MB
        -c max_connections=200
        -c wal_level=replica
        -c max_wal_senders=3

  # ---------------------------------------------------------------------------
  # Redis - Session management and WebSocket coordination
  # ---------------------------------------------------------------------------
  redis:
    image: redis:7-alpine
    container_name: owui-redis
    restart: unless-stopped
    command: >
      redis-server
        --maxmemory 2gb
        --maxmemory-policy allkeys-lru
        --maxclients 10000
        --timeout 1800
        --save 60 1000
        --appendonly yes
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Ollama - Local model inference (smaller models, ≤13B)
  # ---------------------------------------------------------------------------
  ollama:
    image: ollama/ollama:latest
    container_name: owui-ollama
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # vLLM - GPU-optimized inference (large models, 70B+)
  # ---------------------------------------------------------------------------
  vllm:
    image: vllm/vllm-openai:latest
    container_name: owui-vllm
    restart: unless-stopped
    command: >
      --model ${VLLM_MODEL:-meta-llama/Llama-3.1-70B-Instruct}
      --tensor-parallel-size ${VLLM_TP_SIZE:-2}
      --max-model-len ${VLLM_MAX_MODEL_LEN:-8192}
      --gpu-memory-utilization 0.90
      --enforce-eager
      --api-key ${VLLM_API_KEY:-sk-none}
    environment:
      - HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: ${VLLM_TP_SIZE:-2}
              capabilities: [gpu]
    networks:
      - owui-net

  # ---------------------------------------------------------------------------
  # Open Terminal - Sandboxed code execution for scientists
  # ---------------------------------------------------------------------------
  open-terminal:
    image: ghcr.io/open-webui/open-terminal
    container_name: owui-terminal
    restart: unless-stopped
    environment:
      - OPEN_TERMINAL_API_KEY=${OPEN_TERMINAL_API_KEY}
      - OPEN_TERMINAL_PIP_PACKAGES=rdkit-pypi scikit-learn lifelines matplotlib seaborn
      - OPEN_TERMINAL_MAX_SESSIONS=16
    volumes:
      - terminal-data:/home/user
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: "4.0"
    networks:
      - owui-net

# =============================================================================
# Named Volumes
# =============================================================================
volumes:
  owui-data:
    driver: local
  postgres-data:
    driver: local
  redis-data:
    driver: local
  ollama-data:
    driver: local
  terminal-data:
    driver: local

# =============================================================================
# Network
# =============================================================================
networks:
  owui-net:
    driver: bridge

Nginx Configuration

Save this as nginx/nginx.conf:

events {
    worker_connections 1024;
}

http {
    upstream openwebui {
        least_conn;
        server open-webui-1:8080;
        server open-webui-2:8080;
    }

    # Redirect HTTP to HTTPS
    server {
        listen 80;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl;
        server_name ai.yourcompany.com;

        ssl_certificate     /etc/nginx/certs/fullchain.pem;
        ssl_certificate_key /etc/nginx/certs/privkey.pem;
        ssl_protocols       TLSv1.2 TLSv1.3;
        ssl_ciphers         HIGH:!aNULL:!MD5;

        # Security headers
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
        add_header X-Content-Type-Options "nosniff" always;
        add_header X-Frame-Options "SAMEORIGIN" always;
        add_header Referrer-Policy "strict-origin-when-cross-origin" always;

        # Max upload size for document ingestion
        client_max_body_size 100M;

        location / {
            proxy_pass http://openwebui;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # WebSocket support (required for streaming responses)
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";

            # Timeouts for long-running LLM responses
            proxy_read_timeout 300s;
            proxy_send_timeout 300s;
        }
    }
}

Setup Script

Save this as setup.sh and run it before your first docker compose up:

#!/usr/bin/env bash
# =============================================================================
# Open WebUI - Production Setup Script
# =============================================================================
# This script creates the required directory structure, generates secrets,
# pulls initial models, and validates the environment before first boot.
#
# Usage: chmod +x setup.sh && ./setup.sh
# =============================================================================

set -euo pipefail

# --- Colors -----------------------------------------------------------------
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

info()  { echo -e "${GREEN}[INFO]${NC}  $1"; }
warn()  { echo -e "${YELLOW}[WARN]${NC}  $1"; }
error() { echo -e "${RED}[ERROR]${NC} $1"; exit 1; }

# --- Pre-flight checks ------------------------------------------------------
info "Running pre-flight checks..."

command -v docker >/dev/null 2>&1 || error "Docker is not installed."
command -v docker compose >/dev/null 2>&1 || error "Docker Compose v2 is not installed."

DOCKER_VERSION=$(docker version --format '{{.Server.Version}}' 2>/dev/null)
info "Docker version: ${DOCKER_VERSION}"

# Check for NVIDIA GPU (optional)
if command -v nvidia-smi >/dev/null 2>&1; then
    GPU_INFO=$(nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "GPU detected but nvidia-smi query failed")
    info "GPU detected: ${GPU_INFO}"
else
    warn "No NVIDIA GPU detected. Ollama will run on CPU (slower inference)."
    warn "vLLM requires a GPU and will not start without one."
fi

# --- Create directory structure ---------------------------------------------
info "Creating directory structure..."

mkdir -p nginx/certs
mkdir -p data/ollama
mkdir -p data/postgres
mkdir -p data/redis
mkdir -p data/open-webui
mkdir -p backups

# --- Generate secrets -------------------------------------------------------
info "Generating secrets..."

generate_secret() {
    openssl rand -base64 32 | tr -d '/+=' | head -c 48
}

if [ ! -f .env ]; then
    TERMINAL_KEY=$(generate_secret)
    cat > .env << EOF
# =============================================================================
# Open WebUI - Environment Configuration
# Generated on $(date -u +"%Y-%m-%dT%H:%M:%SZ")
# =============================================================================

# --- Public URL ---
WEBUI_URL=https://ai.yourcompany.com
WEBUI_NAME=AI Assistant

# --- Secret key (used for JWT signing - KEEP THIS SECRET) ---
WEBUI_SECRET_KEY=$(generate_secret)

# --- Admin account (created on first startup) ---
ADMIN_EMAIL=admin@yourcompany.com
ADMIN_PASSWORD=$(generate_secret)
ADMIN_NAME=IT Admin

# --- PostgreSQL ---
POSTGRES_USER=openwebui
POSTGRES_PASSWORD=$(generate_secret)
POSTGRES_DB=openwebui

# --- vLLM ---
VLLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
VLLM_TP_SIZE=2
VLLM_MAX_MODEL_LEN=8192
VLLM_API_KEY=$(generate_secret)
HF_TOKEN=hf_your_token_here

# --- Open Terminal ---
OPEN_TERMINAL_API_KEY=${TERMINAL_KEY}
TERMINAL_SERVER_CONNECTIONS='[{\"url\":\"http://open-terminal:8000\",\"key\":\"${TERMINAL_KEY}\"}]'

# --- Workers ---
UVICORN_WORKERS=4

# --- Observability (optional) ---
ENABLE_OTEL=False
OTEL_ENDPOINT=

# =============================================================================
# IMPORTANT: Update the following before deploying:
#   1. WEBUI_URL - your actual domain
#   2. ADMIN_EMAIL / ADMIN_PASSWORD - your admin credentials
#   3. HF_TOKEN - your Hugging Face token (for gated models like Llama)
#   4. Place TLS certs in ./nginx/certs/ (fullchain.pem + privkey.pem)
# =============================================================================
EOF
    info ".env file created. Review and update it before starting."
    warn "Generated admin password is in .env - save it securely."
else
    warn ".env already exists. Skipping generation."
fi

# --- Pull Ollama models -----------------------------------------------------
info "Pulling recommended Ollama models..."
info "(This may take a while on first run.)"

# Start Ollama temporarily to pull models
if docker compose ps ollama 2>/dev/null | grep -q "running"; then
    info "Ollama is already running."
else
    info "Starting Ollama service to pull models..."
    docker compose up -d ollama
    sleep 10  # Wait for Ollama to initialize
fi

# Pull models (adjust to your organization's needs)
MODELS=(
    "llama3.1:8b"       # Fast - summarization, Q&A, literature triage
    "nomic-embed-text"  # Embedding model for RAG
)

for model in "${MODELS[@]}"; do
    info "Pulling ${model}..."
    docker compose exec ollama ollama pull "${model}" || warn "Failed to pull ${model}. You can pull it later."
done

info "Models pulled. You can add more models later via:"
info "  docker compose exec ollama ollama pull <model-name>"

# --- Validate Docker Compose ------------------------------------------------
info "Validating Docker Compose configuration..."
docker compose config --quiet && info "Docker Compose configuration is valid." || error "Docker Compose validation failed."

# --- Summary ----------------------------------------------------------------
echo ""
echo "============================================================================="
echo "  Setup complete!"
echo "============================================================================="
echo ""
echo "  Next steps:"
echo "    1. Edit .env with your domain, admin credentials, and HF token"
echo "    2. Place TLS certificates in ./nginx/certs/"
echo "       - fullchain.pem (certificate chain)"
echo "       - privkey.pem   (private key)"
echo "    3. Update nginx/nginx.conf server_name to match your domain"
echo "    4. Start the stack:  docker compose up -d"
echo "    5. Access the UI at: https://ai.yourcompany.com"
echo ""
echo "  To check service health:  docker compose ps"
echo "  To view logs:             docker compose logs -f open-webui-1"
echo "  To pull more models:      docker compose exec ollama ollama pull <model>"
echo ""
echo "============================================================================="

Environment Variable Reference

These are the same Open WebUI environment variables used in any deployment. This section highlights the ones that organizations in regulated industries may find relevant. These descriptions explain what each setting does - they do not constitute compliance guidance. For the full reference, see the Open WebUI documentation.

Data Retention & Visibility

Note: These descriptions explain the technical behavior of each setting. They do not constitute a compliance determination. Your organization's quality team must evaluate these controls as part of your own Computer System Validation (CSV) process.

Variable	Value	What This Setting Does
`USER_PERMISSIONS_CHAT_DELETE`	`False`	Disables deletion of conversation records at the application level.
`USER_PERMISSIONS_CHAT_TEMPORARY`	`False`	Disables temporary chats, so AI interactions are recorded.
`ENABLE_ADMIN_CHAT_ACCESS`	`False`	Restricts IT administrators from viewing user conversation content at the application level.
`ENABLE_ADMIN_EXPORT`	`False`	Disables bulk extraction of conversation records at the application level.

Access Control

Variable	Value	Rationale
`ENABLE_SIGNUP`	`False`	All users provisioned via SSO or admin. No uncontrolled account creation.
`DEFAULT_USER_ROLE`	`pending`	New SSO users require explicit admin approval before accessing any AI capabilities.
`BYPASS_MODEL_ACCESS_CONTROL`	`False`	Enforces RBAC model restrictions - users only see models assigned to their functional group.
`BYPASS_ADMIN_ACCESS_CONTROL`	`False`	Admins are subject to the same workspace access rules as regular users.
`ENABLE_COMMUNITY_SHARING`	`False`	No data, prompts, or model configurations shared to external community hubs.

RAG & Retrieval

Variable	Value	Rationale
`VECTOR_DB`	`pgvector`	Uses PostgreSQL's PGVector extension - one database for both application data and vector search.
`RAG_TOP_K`	`5`	Returns the top 5 most relevant document chunks. Tune based on document density.
`ENABLE_RAG_HYBRID_SEARCH`	`True`	BM25 + vector ensemble search with reranking. Recommended for scientific documents where exact terminology matters alongside semantic similarity.

Infrastructure

Variable	Value	Rationale
`DATABASE_URL`	`postgresql://...`	Required for multi-node. SQLite cannot handle concurrent writes.
`REDIS_URL`	`redis://redis:6379/0`	Session coordination across stateless nodes.
`WEBSOCKET_MANAGER`	`redis`	Routes streaming responses through Redis for multi-node consistency.
`ENABLE_DB_MIGRATIONS`	`True` (node-1 only)	Only one node should run migrations on startup to prevent race conditions.
`TERMINAL_SERVER_CONNECTIONS`	JSON array	Pre-configures the Open Terminal connection so it's available to all users on startup. Format: `[{"url":"http://open-terminal:8000","key":"<API_KEY>"}]`. Can also be configured manually in Admin Settings → Integrations.

Redis note: The timeout 1800 setting in the Docker Compose Redis config is critical. Without it, idle connections accumulate until maxclients is exhausted and all logins fail. See the Open WebUI Redis documentation.

Open Terminal

Variable	Value	Rationale
`OPEN_TERMINAL_API_KEY`	Generated secret	Bearer API key for authenticating requests from Open WebUI to the terminal container.
`OPEN_TERMINAL_PIP_PACKAGES`	`rdkit-pypi scikit-learn lifelines matplotlib seaborn`	Pre-installs scientific Python libraries at container startup. Scientists can install additional packages at runtime.
`OPEN_TERMINAL_MAX_SESSIONS`	`16`	Maximum concurrent interactive terminal sessions. Prevents resource exhaustion.

Technical Controls Reference

Important

Open WebUI is a general-purpose AI platform. It is not a validated GxP system, and nothing in this guide should be interpreted as a compliance determination. When deployed with the configuration described here, Open WebUI provides technical controls that your organization's quality team can evaluate as part of their own Computer System Validation (CSV) effort. The mappings below are informational - your validation team must independently verify that each control meets your regulatory obligations.

All AI-generated content is unvalidated and must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context.

Technical Controls Available (Validation Required)

The following table lists technical capabilities that Open WebUI provides when configured as described in this guide. These are not compliance claims. Your validation team must independently determine whether these controls satisfy your specific regulatory obligations.

Area	Technical Controls
Audit trail	Conversations are timestamped and attributed to an authenticated user. Chat deletion can be disabled at the application level via `USER_PERMISSIONS_CHAT_DELETE=False`.
System access	SSO/OIDC integration, role-based access control, `DEFAULT_USER_ROLE=pending` for approval workflow.
Authorization	RBAC restricts access to models, documents, and features by functional group.
Availability monitoring	Health checks, OpenTelemetry integration, and Redis session management support monitoring.
User identity	SSO provides authenticated identity. The platform authenticates individual user accounts via SSO/OIDC.
Deployment model	Can be deployed on internal infrastructure with no external dependencies when models are pre-loaded.
Data integrity	Chat deletion can be disabled at the application level. PostgreSQL WAL for write-ahead logging. Automated backups.
Data migration	PostgreSQL `pg_dump`/`pg_restore` with integrity verification. Standard, well-documented process.
Business continuity	Stateless nodes with automatic failover, Redis HA, PostgreSQL WAL archiving for point-in-time recovery.

What This Might Mean for a Validation Team

If your organization uses a risk-based approach to CSV (Computer System Validation), the GAMP categorization, validation scope, and testing depth are decisions your validation team must make based on your specific deployment, customizations, and intended use. Open WebUI's open-source codebase and container-based deployment with version-pinned images may facilitate aspects of your validation process, but the validation strategy itself is an organizational responsibility.

RBAC Configuration Guide

After first deployment, you can configure functional groups via the Admin Panel. The following is an example workflow - your organization should design its own group structure based on its functional areas, risk profile, and governance requirements.

Step 1: Configure OAuth / SSO

Navigate to Admin Panel → Settings → OAuth and configure your identity provider:

OPENID_PROVIDER_URL=https://login.yourcompany.com/.well-known/openid-configuration
OAUTH_CLIENT_ID=<your-client-id>
OAUTH_CLIENT_SECRET=<your-client-secret>
OAUTH_SCOPES=openid email profile groups
OAUTH_GROUP_CLAIM=groups
ENABLE_OAUTH_GROUP_MANAGEMENT=True
ENABLE_OAUTH_GROUP_CREATION=True
ENABLE_OAUTH_ROLE_MANAGEMENT=True

Tip: Set ENABLE_OAUTH_GROUP_MANAGEMENT=True so that functional group membership syncs automatically from your identity provider. When a scientist transfers from R&D to Medical Affairs in your directory, their Open WebUI permissions update on next login - no manual reprovisioning.

Step 2: Create Functional Groups

Navigate to Admin Panel → Groups and create groups matching your organization's structure:

Biostatistics
- Models: All available models
- Knowledge bases: Analysis datasets, statistical analysis plans, CDISC standards libraries
- Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled
- Rationale: Biostatisticians run survival analyses, enrollment dashboards, and forest plots. Open Terminal gives them a computational environment without requiring IT tickets.
Clinical Development
- Models: All available models
- Knowledge bases: Study protocols, investigator brochures, CRF templates, monitoring plan libraries
- Permissions: Document extraction enabled, file upload enabled, web search enabled
- Rationale: Clinical teams work with both internal protocols and public clinical trial registries. Web search enables ClinicalTrials.gov lookups. Document extraction helps process regulatory correspondence.
Manufacturing / CMC
- Models: All available models
- Knowledge bases: Batch records, process validation reports, equipment SOPs
- Permissions: Open Terminal enabled, file upload enabled
- Rationale: CMC scientists frequently upload batch records and deviation reports for AI-assisted root cause analysis. Open Terminal enables batch trend analysis and process parameter visualization.
Medical Affairs
- Models: All available models
- Knowledge bases: Product monographs, congress abstracts, KOL slide decks
- Permissions: Web search enabled, file upload enabled
- Rationale: Medical Affairs teams need access to public literature and congress proceedings alongside internal medical information.
Pharmacovigilance
- Models: Reasoning models only (e.g., Llama 3.1 70B via vLLM)
- Knowledge bases: MedDRA dictionaries, CIOMS forms, signal detection SOPs
- Permissions: RAG-only mode (no web search, no file upload)
- Rationale: PV work is safety-critical. Restricting to RAG-only mode prioritizes retrieval from curated internal documents and disables web search, reducing exposure to uncontrolled external content. The underlying model may still draw on its training data.
R&D / Discovery
- Models: All available models
- Knowledge bases: Compound libraries, assay protocols, literature databases
- Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled, web search enabled
- Rationale: Discovery scientists need the broadest toolset - running SAR analyses and molecular modeling in Open Terminal, uploading proprietary assay results, and searching public literature.
Regulatory Affairs
- Models: All available models
- Knowledge bases: eCTD templates, FDA/EMA guidance, precedent correspondence
- Permissions: Document extraction enabled, file upload enabled
- Rationale: Regulatory scientists frequently need to extract structured data from FDA letters, EMA assessment reports, and deficiency notices.
Support Staff
- Models: Small models only (e.g., Llama 3.1 8B via Ollama)
- Knowledge bases: Company policies, HR procedures, training materials
- Permissions: No file upload, no web search, no terminal access
- Rationale: Minimal access footprint for non-scientific users.

Step 3: Assign Models to Groups

For each model in Admin Panel → Models:

Set visibility to Private (not Public)
Under Access Control, add the groups that should have access
Ensure BYPASS_MODEL_ACCESS_CONTROL=False in your environment so these restrictions are enforced

Step 4: Assign Knowledge Bases to Groups

For each knowledge base in Admin Panel → Knowledge:

Set access control to the relevant functional groups
Users will only see knowledge bases assigned to their group(s) in the chat interface

Step 5: Install the Inline Visualizer Tool & Skill

The Inline Visualizer plugin renders interactive HTML/SVG visualizations directly in the chat. It includes a theme-aware design system with color ramps, SVG utility classes, and a communication bridge that lets visualizations send prompts back to the chat for conversational exploration.

This plugin has two components:

Component	File	Install Location
Tool	`tool.py`	Workspace → Tools
Skill	`SKILL.md`	Workspace → Knowledge → Create Skill

Install the Tool:

Copy the contents of tool.py
In Open WebUI, go to Workspace → Tools → + Create New
Paste the code and click Save

Install the Skill:

Copy the contents of SKILL.md
In Open WebUI, go to Workspace → Knowledge → + Create Skill
Name it visualize (this exact name is required)
Paste the contents and click Save

Attach to Models:

Go to Admin Panel → Models and edit each model that should support visualizations
Under Tools, enable the Inline Visualizer tool
Under Skills, attach the visualize skill
Ensure native function calling is enabled for the model
Save

Enable Interactive Features (Optional):

Go to Settings → Interface
Enable iframe Sandbox Allow Same Origin

Without this, visualizations render normally but interactive buttons that send prompts back to the chat (sendPrompt) will not work.

Tip: A strong model is required for complex, visually detailed interactive visualizations. Tested with Claude Haiku 4.5 and Claude Opus 4.5.

Knowledge Base Setup Guide

Open WebUI's RAG system ingests documents and creates searchable vector embeddings in PGVector. This section provides an example knowledge base design for pharmaceutical contexts.

Recommended Knowledge Base Structure

Knowledge Base	Contents	Functional Groups	Notes
`Compound Library`	Structures, SAR data, screening results, MoA summaries	R&D / Discovery	High sensitivity - restrict strictly to R&D
`Assay Protocols`	Standard assay procedures, validation data, reference standards	R&D / Discovery
`Clinical Protocols`	Study protocols, ICH E6/E8/E9 references, SAPs	Clinical Development
`CRF Templates`	Case report forms, data management plans, reconciliation guides	Clinical Development
`Statistical Methods`	SAPs, CDISC standards, analysis dataset specifications	Biostatistics
`Regulatory Guidance`	FDA guidance library, EMA guidelines, ICH harmonized guidelines	Regulatory Affairs	Consider splitting by region (FDA/EMA/PMDA)
`Submission Templates`	eCTD module templates, cover letters, precedent review correspondence	Regulatory Affairs
`PV Reference`	MedDRA hierarchy, CIOMS forms, signal detection SOPs, PSUR templates	Pharmacovigilance	Reviewed documents only - no drafts
`Manufacturing SOPs`	Batch records, process validation reports, equipment qualification docs	Manufacturing / CMC
`Medical Information`	Product monographs, SmPCs, congress posters, medical response letters	Medical Affairs
`Company Policies`	HR handbook, compliance policies, IT security procedures, training guides	All groups

Upload Workflow

Navigate to Workspace → Knowledge → Create Knowledge Base

2. Name the knowledge base and set the access control to the relevant functional group(s) 3. Upload documents (supported formats: PDF, DOCX, TXT, Markdown, HTML, CSV, XLSX, PPTX) 4. Open WebUI automatically:

Extracts text from uploaded documents
Chunks the content for optimal retrieval
Generates vector embeddings and stores them in PGVector

Users in the assigned groups can now reference this knowledge base in chat by typing # followed by the knowledge base name

RAG Best Practices

Regulatory submissions: Large eCTD modules should be split by section (e.g., upload Module 2.5 Quality Overall Summary separately from Module 3.2.P Drug Product). This improves retrieval precision significantly.
SOPs and batch records: These are typically well-structured documents that RAG handles effectively. Use descriptive filenames that include the SOP number and revision (e.g., SOP-MFG-042-Rev3-Tablet-Compression.pdf).
Literature databases: For large literature collections (1,000+ papers), consider organizing into topic-specific knowledge bases rather than one monolithic collection. This lets users target their retrieval.
Citation verification: RAG provides relevance scores with each retrieved chunk. Scientists must always verify citations against the source - RAG reduces hallucination but does not eliminate it. All AI-generated content must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context. This is especially critical for PV and regulatory use cases.
Version control: When SOPs are revised or guidance documents update, upload the new version and remove the old one. Knowledge bases can be updated without downtime. Maintain a document version log outside Open WebUI for your QMS.

Embedding Model Selection

The Docker Compose stack pulls nomic-embed-text via Ollama for generating embeddings locally. Configure this in Admin Panel → Settings → Documents → Embedding Model.

For higher-quality embeddings (recommended for 10,000+ document deployments), consider using a dedicated embedding endpoint. Set RAG_OPENAI_API_BASE_URL to point to a self-hosted embedding service or use Ollama's built-in embedding support (both options keep data on your infrastructure when configured accordingly).

Security Hardening Checklist

The following checklist describes operational security measures. This is not a compliance checklist - your organization's quality, security, and compliance teams should determine which items apply to your environment and what additional measures are needed.

Network Layer

TLS 1.2+ enforced on the reverse proxy - no plaintext HTTP traffic reaches Open WebUI
Only port 443 is exposed to the user network; all other services are on internal Docker network
HSTS header set with max-age=63072000; includeSubDomains
Security headers: X-Content-Type-Options: nosniff, X-Frame-Options: SAMEORIGIN, Referrer-Policy: strict-origin-when-cross-origin
Rate limiting configured on the reverse proxy to prevent abuse
DNS resolves only to the reverse proxy - no direct access to application nodes

Authentication & Authorization

ENABLE_SIGNUP=False - no self-registration
DEFAULT_USER_ROLE=pending - new SSO users require admin approval
SSO/OIDC configured with your organization's identity provider
ENABLE_OAUTH_GROUP_MANAGEMENT=True - groups sync from IdP
ENABLE_OAUTH_ROLE_MANAGEMENT=True - roles sync from IdP
BYPASS_MODEL_ACCESS_CONTROL=False - RBAC enforced on model access
BYPASS_ADMIN_ACCESS_CONTROL=False - admins subject to workspace ACLs

Data Protection

ENABLE_ADMIN_CHAT_ACCESS=False - restricts IT administrators from viewing user conversation content at the application level
ENABLE_ADMIN_EXPORT=False - disables bulk data extraction at the application level
USER_PERMISSIONS_CHAT_DELETE=False - disables chat deletion at the application level
USER_PERMISSIONS_CHAT_TEMPORARY=False - no unlogged conversations
ENABLE_COMMUNITY_SHARING=False - no external data sharing
PostgreSQL configured with encryption at rest (transparent data encryption or full-disk encryption on the host)
Redis requirepass set if Redis is network-accessible (not needed when Redis is internal-only via Docker network)
Backup encryption enabled (see Backup & Disaster Recovery)

Model & Inference Security

When configured for local-only inference, all models run via Ollama or vLLM on your infrastructure
Hugging Face token is stored only in .env, not committed to version control
.env file has restrictive permissions: chmod 600 .env
For production deployments, consider migrating secrets from .env to a dedicated secrets manager (e.g., HashiCorp Vault, AWS Secrets Manager)
vLLM API key (VLLM_API_KEY) is set to prevent unauthorized direct access to the inference endpoint
Docker image tags pinned to specific versions (not :main or :latest) for reproducible, auditable deployments
If Functions are used: LLM-Guard or equivalent function installed for prompt injection scanning

Open Terminal Security

OPEN_TERMINAL_API_KEY is set — without it, anyone who can reach the port has full access
Open Terminal container is on the internal Docker network only — not exposed to external traffic
Resource limits applied: memory: 4G and cpus: 4.0 (or appropriate for your environment)
OPEN_TERMINAL_MAX_SESSIONS=16 to prevent resource exhaustion from concurrent terminal sessions
Docker socket is not mounted (/var/run/docker.sock) — unless explicitly required and the environment is trusted
Named volume mounted at /home/user for file persistence across container restarts
Open Terminal access restricted to appropriate functional groups via Admin Settings → Integrations

Operational Security

ENABLE_DB_MIGRATIONS=True on exactly one node; False on all others
Redis maxclients set to 10000+ and timeout set to 1800
Log aggregation configured (OpenTelemetry, Splunk, Datadog, or equivalent)
Alerting on container restarts, database connection failures, and GPU memory exhaustion
Docker image tags pinned to specific versions in production (not :main or :latest)
Regular security updates for base images (docker compose pull && docker compose up -d)

Backup & Disaster Recovery

What to Back Up

Component	Data Location	Backup Method
PostgreSQL	`postgres-data` volume	`pg_dump` (logical) or continuous WAL archiving
Redis	`redis-data` volume	AOF + RDB snapshots (handled by Redis config)
Ollama models	`ollama-data` volume	Volume snapshot or re-pull (models are public)
Open WebUI data	`owui-data` volume	Volume snapshot
Open Terminal data	`terminal-data` volume	Volume snapshot (or ephemeral — rebuild on demand)
Configuration	`.env`, `nginx/`, `docker-compose.yml`	Git repository (exclude secrets)
TLS certificates	`nginx/certs/`	Certificate management system

Automated PostgreSQL Backup Script

Add this to your crontab (crontab -e) or scheduling system:

#!/usr/bin/env bash
# Daily PostgreSQL backup - run via cron at 02:00 UTC
# 0 2 * * * /opt/openwebui/backup-postgres.sh

set -euo pipefail

BACKUP_DIR="/opt/openwebui/backups"
RETENTION_DAYS=30
TIMESTAMP=$(date -u +"%Y%m%d_%H%M%S")
BACKUP_FILE="${BACKUP_DIR}/openwebui_${TIMESTAMP}.sql.gz"

mkdir -p "${BACKUP_DIR}"

docker compose exec -T postgres pg_dump \
    -U "${POSTGRES_USER:-openwebui}" \
    -d "${POSTGRES_DB:-openwebui}" \
    --format=custom \
    --compress=9 \
    > "${BACKUP_FILE}"

# Verify backup integrity (pg_restore runs on the host against the host-side file)
pg_restore --list "${BACKUP_FILE}" > /dev/null 2>&1 \
    && echo "[OK] Backup verified: ${BACKUP_FILE}" \
    || echo "[ERROR] Backup verification failed: ${BACKUP_FILE}"

# Prune old backups
find "${BACKUP_DIR}" -name "openwebui_*.sql.gz" -mtime +${RETENTION_DAYS} -delete

echo "[INFO] Backup complete. Size: $(du -h "${BACKUP_FILE}" | cut -f1)"

Recovery Procedure

Stop Open WebUI nodes: docker compose stop open-webui-1 open-webui-2
Restore PostgreSQL: docker compose exec -T postgres pg_restore -U openwebui -d openwebui --clean < backup.sql.gz
Restart: docker compose up -d
Verify: Check health endpoints and run a test query

RPO / RTO Targets

Scenario	RPO (Data Loss)	RTO (Downtime)
Single node failure	0 (stateless, auto-recovered)	< 30 seconds (health check interval)
Database corruption	≤ 24 hours (daily backups)	< 1 hour
Full infrastructure loss	≤ 24 hours	2–4 hours (restore from backups)
With WAL archiving	≤ 5 minutes	< 1 hour

For mission-critical deployments, enable PostgreSQL WAL archiving for point-in-time recovery with an RPO of minutes rather than hours.

This guide is maintained alongside What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. For questions about enterprise deployment, contact sales@openwebui.com.

FilesExpand file tree

setup.md

Latest commit

History