This guide is a technical reference companion to What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. It walks through one possible production architecture for self-hosting Open WebUI, along with configuration examples that organizations in regulated industries may find relevant. This is a starting point for evaluation, not a prescriptive deployment guide - your organization's engineering, quality, and compliance teams should adapt this architecture to your specific requirements.
- Architecture Overview
- Pre-Requisites
- Docker Compose Reference
- Setup Script
- Environment Variable Reference
- Technical Controls Reference
- RBAC Configuration Guide
- Knowledge Base Setup Guide
- Security Hardening Checklist
- Backup & Disaster Recovery
The production stack is identical to any Open WebUI enterprise deployment: reverse proxy with TLS, stateless application nodes, PostgreSQL + PGVector for data and vector search, Redis for session coordination, local inference via Ollama and vLLM, and Open Terminal for sandboxed code execution. See the blog post for the architecture diagram and rationale.
This deployment also includes the Inline Visualizer tool and skill, which renders interactive HTML/SVG visualizations directly in the chat. When combined with Open Terminal for computational work, this gives scientists two complementary paths to visual output: Open Terminal for publication-quality figures generated by matplotlib, RDKit, and other scientific Python libraries, and Inline Visualizer for interactive diagrams, flowcharts, and explorable visuals rendered natively in the conversation.
What makes this configuration different from a generic deployment isn't the infrastructure - it's the configuration layer on top: how access is structured, how knowledge bases map to functional groups, and how audit records are retained. The settings below are examples of how organizations in regulated industries have approached these decisions.
Example configuration decisions:
ENABLE_ADMIN_CHAT_ACCESS=False- Restricts IT administrators from viewing user conversation content at the application level.USER_PERMISSIONS_CHAT_DELETE=False+USER_PERMISSIONS_CHAT_TEMPORARY=False- Disables chat deletion and temporary chats at the application level, so AI interactions are persisted with timestamps.ENABLE_COMMUNITY_SHARING=False- Disables sharing of data, prompts, or model configurations externally.BYPASS_MODEL_ACCESS_CONTROL=False- Enforces functional group boundaries. A CMC scientist sees manufacturing models and documents; a PV officer sees pharmacovigilance resources. This helps prevent cross-group data exposure.
For an organization with 500–10,000+ employees and concurrent usage of ~100–500 users:
| Component | Minimum | Recommended |
|---|---|---|
| Open WebUI nodes | 2× (4 vCPU, 8 GB RAM each) | 3× (8 vCPU, 16 GB RAM each) |
| PostgreSQL | 4 vCPU, 16 GB RAM, 500 GB SSD | 8 vCPU, 32 GB RAM, 1 TB NVMe |
| Redis | 2 vCPU, 4 GB RAM | 2 vCPU, 8 GB RAM (Sentinel: 3 nodes) |
| Ollama (small models, ≤13B) | 1× NVIDIA GPU (24 GB VRAM, e.g., RTX 4090) | 2× GPUs behind load balancer |
| vLLM (large models, 70B+) | 2× NVIDIA A100 80 GB (tensor parallel) | 4× A100 80 GB or 2× H100 |
| Shared storage | 1 TB S3-compatible or NFS | 5 TB+ with lifecycle policies |
- Docker Engine ≥ 24.0 and Docker Compose ≥ 2.20
- NVIDIA Container Toolkit (for GPU nodes) - installation guide
- TLS certificates - from your organization's internal CA or Let's Encrypt
- LDAP / SSO credentials - for OAuth/OIDC integration (Okta, Azure AD, Ping Identity, etc.)
- DNS entry - e.g.,
ai.yourcompany.compointing to the reverse proxy
- All services communicate on an internal Docker network - no public exposure except the reverse proxy
- Outbound internet access is not required if models are pre-pulled (fully air-gappable)
- Ports: only
443(HTTPS) exposed externally
Save this as docker-compose.yml in your deployment directory. An accompanying .env file is generated by the setup script below.
# =============================================================================
# Open WebUI - Production Stack
# =============================================================================
# Usage:
# 1. Run ./setup.sh to generate .env and required directories
# 2. docker compose up -d
# 3. Access via https://ai.yourcompany.com
# =============================================================================
services:
# ---------------------------------------------------------------------------
# Reverse Proxy - TLS termination and load balancing
# ---------------------------------------------------------------------------
nginx:
image: nginx:alpine
container_name: owui-proxy
restart: unless-stopped
ports:
- "443:443"
- "80:80" # Redirect to HTTPS
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/certs:/etc/nginx/certs:ro
depends_on:
open-webui-1:
condition: service_healthy
networks:
- owui-net
# ---------------------------------------------------------------------------
# Open WebUI - Stateless application nodes
# ---------------------------------------------------------------------------
open-webui-1:
image: ghcr.io/open-webui/open-webui:0.6 # Pin to a specific version for production environments
container_name: owui-node-1
restart: unless-stopped
environment:
# --- Core ---
- WEBUI_URL=${WEBUI_URL}
- WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
- PORT=8080
# --- Database ---
- DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
# --- Vector DB (PGVector, same PostgreSQL instance) ---
- VECTOR_DB=pgvector
- PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
# --- Redis ---
- REDIS_URL=redis://redis:6379/0
- WEBSOCKET_MANAGER=redis
- WEBSOCKET_REDIS_URL=redis://redis:6379/0
- ENABLE_WEBSOCKET_SUPPORT=True
# --- Inference backends ---
- OLLAMA_BASE_URL=http://ollama:11434
- ENABLE_OLLAMA_API=True
- OPENAI_API_BASE_URL=http://vllm:8000/v1
- OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
- ENABLE_OPENAI_API=True
# --- Security defaults ---
- ENABLE_SIGNUP=False
- DEFAULT_USER_ROLE=pending
- ENABLE_ADMIN_CHAT_ACCESS=False
- ENABLE_ADMIN_EXPORT=False
- BYPASS_MODEL_ACCESS_CONTROL=False
- BYPASS_ADMIN_ACCESS_CONTROL=False
- ENABLE_COMMUNITY_SHARING=False
# --- User permissions ---
- USER_PERMISSIONS_CHAT_DELETE=False
- USER_PERMISSIONS_CHAT_TEMPORARY=False
# --- RAG tuning ---
- RAG_TOP_K=5
- RAG_SYSTEM_CONTEXT=True
- ENABLE_RAG_HYBRID_SEARCH=True
# --- Admin provisioning (first startup only) ---
- WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
- WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
- WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}
# --- Workers ---
- UVICORN_WORKERS=${UVICORN_WORKERS:-4}
- ENABLE_DB_MIGRATIONS=True # Only on node-1; set False on others
# --- Observability (optional) ---
- ENABLE_OTEL=${ENABLE_OTEL:-False}
- OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}
# --- Open Terminal integration ---
- TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}
# --- Persistent config ---
- ENABLE_PERSISTENT_CONFIG=True
volumes:
- owui-data:/app/backend/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
networks:
- owui-net
# Note: open-webui-2 duplicates the environment from open-webui-1 because
# Docker Compose list-style environment blocks do not support YAML merge keys.
# If you add or change a variable above, update it here as well.
open-webui-2:
image: ghcr.io/open-webui/open-webui:0.6 # Pin to a specific version for production environments
container_name: owui-node-2
restart: unless-stopped
environment:
- WEBUI_URL=${WEBUI_URL}
- WEBUI_NAME=${WEBUI_NAME:-AI Assistant}
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
- PORT=8080
- DATABASE_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
- VECTOR_DB=pgvector
- PGVECTOR_DB_URL=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB}
- REDIS_URL=redis://redis:6379/0
- WEBSOCKET_MANAGER=redis
- WEBSOCKET_REDIS_URL=redis://redis:6379/0
- ENABLE_WEBSOCKET_SUPPORT=True
- OLLAMA_BASE_URL=http://ollama:11434
- ENABLE_OLLAMA_API=True
- OPENAI_API_BASE_URL=http://vllm:8000/v1
- OPENAI_API_KEY=${VLLM_API_KEY:-sk-none}
- ENABLE_OPENAI_API=True
- ENABLE_SIGNUP=False
- DEFAULT_USER_ROLE=pending
- ENABLE_ADMIN_CHAT_ACCESS=False
- ENABLE_ADMIN_EXPORT=False
- BYPASS_MODEL_ACCESS_CONTROL=False
- BYPASS_ADMIN_ACCESS_CONTROL=False
- ENABLE_COMMUNITY_SHARING=False
- USER_PERMISSIONS_CHAT_DELETE=False
- USER_PERMISSIONS_CHAT_TEMPORARY=False
- RAG_TOP_K=5
- RAG_SYSTEM_CONTEXT=True
- ENABLE_RAG_HYBRID_SEARCH=True
- WEBUI_ADMIN_EMAIL=${ADMIN_EMAIL}
- WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD}
- WEBUI_ADMIN_NAME=${ADMIN_NAME:-IT Admin}
- UVICORN_WORKERS=${UVICORN_WORKERS:-4}
- ENABLE_DB_MIGRATIONS=False # Node-1 handles migrations
- ENABLE_OTEL=${ENABLE_OTEL:-False}
- OTEL_EXPORTER_OTLP_ENDPOINT=${OTEL_ENDPOINT:-}
- TERMINAL_SERVER_CONNECTIONS=${TERMINAL_SERVER_CONNECTIONS:-[]}
- ENABLE_PERSISTENT_CONFIG=True
volumes:
- owui-data:/app/backend/data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 60s
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
networks:
- owui-net
# ---------------------------------------------------------------------------
# PostgreSQL 16 + PGVector - Database and vector store
# ---------------------------------------------------------------------------
postgres:
image: pgvector/pgvector:pg16
container_name: owui-postgres
restart: unless-stopped
environment:
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=${POSTGRES_DB}
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
networks:
- owui-net
# Recommended: tune postgresql.conf for your hardware
command: >
postgres
-c shared_buffers=2GB
-c effective_cache_size=6GB
-c work_mem=64MB
-c maintenance_work_mem=512MB
-c max_connections=200
-c wal_level=replica
-c max_wal_senders=3
# ---------------------------------------------------------------------------
# Redis - Session management and WebSocket coordination
# ---------------------------------------------------------------------------
redis:
image: redis:7-alpine
container_name: owui-redis
restart: unless-stopped
command: >
redis-server
--maxmemory 2gb
--maxmemory-policy allkeys-lru
--maxclients 10000
--timeout 1800
--save 60 1000
--appendonly yes
volumes:
- redis-data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
networks:
- owui-net
# ---------------------------------------------------------------------------
# Ollama - Local model inference (smaller models, ≤13B)
# ---------------------------------------------------------------------------
ollama:
image: ollama/ollama:latest
container_name: owui-ollama
restart: unless-stopped
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
networks:
- owui-net
# ---------------------------------------------------------------------------
# vLLM - GPU-optimized inference (large models, 70B+)
# ---------------------------------------------------------------------------
vllm:
image: vllm/vllm-openai:latest
container_name: owui-vllm
restart: unless-stopped
command: >
--model ${VLLM_MODEL:-meta-llama/Llama-3.1-70B-Instruct}
--tensor-parallel-size ${VLLM_TP_SIZE:-2}
--max-model-len ${VLLM_MAX_MODEL_LEN:-8192}
--gpu-memory-utilization 0.90
--enforce-eager
--api-key ${VLLM_API_KEY:-sk-none}
environment:
- HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: ${VLLM_TP_SIZE:-2}
capabilities: [gpu]
networks:
- owui-net
# ---------------------------------------------------------------------------
# Open Terminal - Sandboxed code execution for scientists
# ---------------------------------------------------------------------------
open-terminal:
image: ghcr.io/open-webui/open-terminal
container_name: owui-terminal
restart: unless-stopped
environment:
- OPEN_TERMINAL_API_KEY=${OPEN_TERMINAL_API_KEY}
- OPEN_TERMINAL_PIP_PACKAGES=rdkit-pypi scikit-learn lifelines matplotlib seaborn
- OPEN_TERMINAL_MAX_SESSIONS=16
volumes:
- terminal-data:/home/user
deploy:
resources:
limits:
memory: 4G
cpus: "4.0"
networks:
- owui-net
# =============================================================================
# Named Volumes
# =============================================================================
volumes:
owui-data:
driver: local
postgres-data:
driver: local
redis-data:
driver: local
ollama-data:
driver: local
terminal-data:
driver: local
# =============================================================================
# Network
# =============================================================================
networks:
owui-net:
driver: bridgeSave this as nginx/nginx.conf:
events {
worker_connections 1024;
}
http {
upstream openwebui {
least_conn;
server open-webui-1:8080;
server open-webui-2:8080;
}
# Redirect HTTP to HTTPS
server {
listen 80;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name ai.yourcompany.com;
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Security headers
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
# Max upload size for document ingestion
client_max_body_size 100M;
location / {
proxy_pass http://openwebui;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support (required for streaming responses)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts for long-running LLM responses
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}
}Save this as setup.sh and run it before your first docker compose up:
#!/usr/bin/env bash
# =============================================================================
# Open WebUI - Production Setup Script
# =============================================================================
# This script creates the required directory structure, generates secrets,
# pulls initial models, and validates the environment before first boot.
#
# Usage: chmod +x setup.sh && ./setup.sh
# =============================================================================
set -euo pipefail
# --- Colors -----------------------------------------------------------------
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
info() { echo -e "${GREEN}[INFO]${NC} $1"; }
warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
error() { echo -e "${RED}[ERROR]${NC} $1"; exit 1; }
# --- Pre-flight checks ------------------------------------------------------
info "Running pre-flight checks..."
command -v docker >/dev/null 2>&1 || error "Docker is not installed."
command -v docker compose >/dev/null 2>&1 || error "Docker Compose v2 is not installed."
DOCKER_VERSION=$(docker version --format '{{.Server.Version}}' 2>/dev/null)
info "Docker version: ${DOCKER_VERSION}"
# Check for NVIDIA GPU (optional)
if command -v nvidia-smi >/dev/null 2>&1; then
GPU_INFO=$(nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "GPU detected but nvidia-smi query failed")
info "GPU detected: ${GPU_INFO}"
else
warn "No NVIDIA GPU detected. Ollama will run on CPU (slower inference)."
warn "vLLM requires a GPU and will not start without one."
fi
# --- Create directory structure ---------------------------------------------
info "Creating directory structure..."
mkdir -p nginx/certs
mkdir -p data/ollama
mkdir -p data/postgres
mkdir -p data/redis
mkdir -p data/open-webui
mkdir -p backups
# --- Generate secrets -------------------------------------------------------
info "Generating secrets..."
generate_secret() {
openssl rand -base64 32 | tr -d '/+=' | head -c 48
}
if [ ! -f .env ]; then
TERMINAL_KEY=$(generate_secret)
cat > .env << EOF
# =============================================================================
# Open WebUI - Environment Configuration
# Generated on $(date -u +"%Y-%m-%dT%H:%M:%SZ")
# =============================================================================
# --- Public URL ---
WEBUI_URL=https://ai.yourcompany.com
WEBUI_NAME=AI Assistant
# --- Secret key (used for JWT signing - KEEP THIS SECRET) ---
WEBUI_SECRET_KEY=$(generate_secret)
# --- Admin account (created on first startup) ---
ADMIN_EMAIL=admin@yourcompany.com
ADMIN_PASSWORD=$(generate_secret)
ADMIN_NAME=IT Admin
# --- PostgreSQL ---
POSTGRES_USER=openwebui
POSTGRES_PASSWORD=$(generate_secret)
POSTGRES_DB=openwebui
# --- vLLM ---
VLLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
VLLM_TP_SIZE=2
VLLM_MAX_MODEL_LEN=8192
VLLM_API_KEY=$(generate_secret)
HF_TOKEN=hf_your_token_here
# --- Open Terminal ---
OPEN_TERMINAL_API_KEY=${TERMINAL_KEY}
TERMINAL_SERVER_CONNECTIONS='[{\"url\":\"http://open-terminal:8000\",\"key\":\"${TERMINAL_KEY}\"}]'
# --- Workers ---
UVICORN_WORKERS=4
# --- Observability (optional) ---
ENABLE_OTEL=False
OTEL_ENDPOINT=
# =============================================================================
# IMPORTANT: Update the following before deploying:
# 1. WEBUI_URL - your actual domain
# 2. ADMIN_EMAIL / ADMIN_PASSWORD - your admin credentials
# 3. HF_TOKEN - your Hugging Face token (for gated models like Llama)
# 4. Place TLS certs in ./nginx/certs/ (fullchain.pem + privkey.pem)
# =============================================================================
EOF
info ".env file created. Review and update it before starting."
warn "Generated admin password is in .env - save it securely."
else
warn ".env already exists. Skipping generation."
fi
# --- Pull Ollama models -----------------------------------------------------
info "Pulling recommended Ollama models..."
info "(This may take a while on first run.)"
# Start Ollama temporarily to pull models
if docker compose ps ollama 2>/dev/null | grep -q "running"; then
info "Ollama is already running."
else
info "Starting Ollama service to pull models..."
docker compose up -d ollama
sleep 10 # Wait for Ollama to initialize
fi
# Pull models (adjust to your organization's needs)
MODELS=(
"llama3.1:8b" # Fast - summarization, Q&A, literature triage
"nomic-embed-text" # Embedding model for RAG
)
for model in "${MODELS[@]}"; do
info "Pulling ${model}..."
docker compose exec ollama ollama pull "${model}" || warn "Failed to pull ${model}. You can pull it later."
done
info "Models pulled. You can add more models later via:"
info " docker compose exec ollama ollama pull <model-name>"
# --- Validate Docker Compose ------------------------------------------------
info "Validating Docker Compose configuration..."
docker compose config --quiet && info "Docker Compose configuration is valid." || error "Docker Compose validation failed."
# --- Summary ----------------------------------------------------------------
echo ""
echo "============================================================================="
echo " Setup complete!"
echo "============================================================================="
echo ""
echo " Next steps:"
echo " 1. Edit .env with your domain, admin credentials, and HF token"
echo " 2. Place TLS certificates in ./nginx/certs/"
echo " - fullchain.pem (certificate chain)"
echo " - privkey.pem (private key)"
echo " 3. Update nginx/nginx.conf server_name to match your domain"
echo " 4. Start the stack: docker compose up -d"
echo " 5. Access the UI at: https://ai.yourcompany.com"
echo ""
echo " To check service health: docker compose ps"
echo " To view logs: docker compose logs -f open-webui-1"
echo " To pull more models: docker compose exec ollama ollama pull <model>"
echo ""
echo "============================================================================="These are the same Open WebUI environment variables used in any deployment. This section highlights the ones that organizations in regulated industries may find relevant. These descriptions explain what each setting does - they do not constitute compliance guidance. For the full reference, see the Open WebUI documentation.
Note: These descriptions explain the technical behavior of each setting. They do not constitute a compliance determination. Your organization's quality team must evaluate these controls as part of your own Computer System Validation (CSV) process.
| Variable | Value | What This Setting Does |
|---|---|---|
USER_PERMISSIONS_CHAT_DELETE |
False |
Disables deletion of conversation records at the application level. |
USER_PERMISSIONS_CHAT_TEMPORARY |
False |
Disables temporary chats, so AI interactions are recorded. |
ENABLE_ADMIN_CHAT_ACCESS |
False |
Restricts IT administrators from viewing user conversation content at the application level. |
ENABLE_ADMIN_EXPORT |
False |
Disables bulk extraction of conversation records at the application level. |
| Variable | Value | Rationale |
|---|---|---|
ENABLE_SIGNUP |
False |
All users provisioned via SSO or admin. No uncontrolled account creation. |
DEFAULT_USER_ROLE |
pending |
New SSO users require explicit admin approval before accessing any AI capabilities. |
BYPASS_MODEL_ACCESS_CONTROL |
False |
Enforces RBAC model restrictions - users only see models assigned to their functional group. |
BYPASS_ADMIN_ACCESS_CONTROL |
False |
Admins are subject to the same workspace access rules as regular users. |
ENABLE_COMMUNITY_SHARING |
False |
No data, prompts, or model configurations shared to external community hubs. |
| Variable | Value | Rationale |
|---|---|---|
VECTOR_DB |
pgvector |
Uses PostgreSQL's PGVector extension - one database for both application data and vector search. |
RAG_TOP_K |
5 |
Returns the top 5 most relevant document chunks. Tune based on document density. |
ENABLE_RAG_HYBRID_SEARCH |
True |
BM25 + vector ensemble search with reranking. Recommended for scientific documents where exact terminology matters alongside semantic similarity. |
| Variable | Value | Rationale |
|---|---|---|
DATABASE_URL |
postgresql://... |
Required for multi-node. SQLite cannot handle concurrent writes. |
REDIS_URL |
redis://redis:6379/0 |
Session coordination across stateless nodes. |
WEBSOCKET_MANAGER |
redis |
Routes streaming responses through Redis for multi-node consistency. |
ENABLE_DB_MIGRATIONS |
True (node-1 only) |
Only one node should run migrations on startup to prevent race conditions. |
TERMINAL_SERVER_CONNECTIONS |
JSON array | Pre-configures the Open Terminal connection so it's available to all users on startup. Format: [{"url":"http://open-terminal:8000","key":"<API_KEY>"}]. Can also be configured manually in Admin Settings → Integrations. |
Redis note: The
timeout 1800setting in the Docker Compose Redis config is critical. Without it, idle connections accumulate untilmaxclientsis exhausted and all logins fail. See the Open WebUI Redis documentation.
| Variable | Value | Rationale |
|---|---|---|
OPEN_TERMINAL_API_KEY |
Generated secret | Bearer API key for authenticating requests from Open WebUI to the terminal container. |
OPEN_TERMINAL_PIP_PACKAGES |
rdkit-pypi scikit-learn lifelines matplotlib seaborn |
Pre-installs scientific Python libraries at container startup. Scientists can install additional packages at runtime. |
OPEN_TERMINAL_MAX_SESSIONS |
16 |
Maximum concurrent interactive terminal sessions. Prevents resource exhaustion. |
Important
Open WebUI is a general-purpose AI platform. It is not a validated GxP system, and nothing in this guide should be interpreted as a compliance determination. When deployed with the configuration described here, Open WebUI provides technical controls that your organization's quality team can evaluate as part of their own Computer System Validation (CSV) effort. The mappings below are informational - your validation team must independently verify that each control meets your regulatory obligations.
All AI-generated content is unvalidated and must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context.
The following table lists technical capabilities that Open WebUI provides when configured as described in this guide. These are not compliance claims. Your validation team must independently determine whether these controls satisfy your specific regulatory obligations.
| Area | Technical Controls |
|---|---|
| Audit trail | Conversations are timestamped and attributed to an authenticated user. Chat deletion can be disabled at the application level via USER_PERMISSIONS_CHAT_DELETE=False. |
| System access | SSO/OIDC integration, role-based access control, DEFAULT_USER_ROLE=pending for approval workflow. |
| Authorization | RBAC restricts access to models, documents, and features by functional group. |
| Availability monitoring | Health checks, OpenTelemetry integration, and Redis session management support monitoring. |
| User identity | SSO provides authenticated identity. The platform authenticates individual user accounts via SSO/OIDC. |
| Deployment model | Can be deployed on internal infrastructure with no external dependencies when models are pre-loaded. |
| Data integrity | Chat deletion can be disabled at the application level. PostgreSQL WAL for write-ahead logging. Automated backups. |
| Data migration | PostgreSQL pg_dump/pg_restore with integrity verification. Standard, well-documented process. |
| Business continuity | Stateless nodes with automatic failover, Redis HA, PostgreSQL WAL archiving for point-in-time recovery. |
If your organization uses a risk-based approach to CSV (Computer System Validation), the GAMP categorization, validation scope, and testing depth are decisions your validation team must make based on your specific deployment, customizations, and intended use. Open WebUI's open-source codebase and container-based deployment with version-pinned images may facilitate aspects of your validation process, but the validation strategy itself is an organizational responsibility.
After first deployment, you can configure functional groups via the Admin Panel. The following is an example workflow - your organization should design its own group structure based on its functional areas, risk profile, and governance requirements.
Navigate to Admin Panel → Settings → OAuth and configure your identity provider:
OPENID_PROVIDER_URL=https://login.yourcompany.com/.well-known/openid-configuration
OAUTH_CLIENT_ID=<your-client-id>
OAUTH_CLIENT_SECRET=<your-client-secret>
OAUTH_SCOPES=openid email profile groups
OAUTH_GROUP_CLAIM=groups
ENABLE_OAUTH_GROUP_MANAGEMENT=True
ENABLE_OAUTH_GROUP_CREATION=True
ENABLE_OAUTH_ROLE_MANAGEMENT=True
Tip: Set
ENABLE_OAUTH_GROUP_MANAGEMENT=Trueso that functional group membership syncs automatically from your identity provider. When a scientist transfers from R&D to Medical Affairs in your directory, their Open WebUI permissions update on next login - no manual reprovisioning.
Navigate to Admin Panel → Groups and create groups matching your organization's structure:
-
Biostatistics
- Models: All available models
- Knowledge bases: Analysis datasets, statistical analysis plans, CDISC standards libraries
- Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled
- Rationale: Biostatisticians run survival analyses, enrollment dashboards, and forest plots. Open Terminal gives them a computational environment without requiring IT tickets.
-
Clinical Development
- Models: All available models
- Knowledge bases: Study protocols, investigator brochures, CRF templates, monitoring plan libraries
- Permissions: Document extraction enabled, file upload enabled, web search enabled
- Rationale: Clinical teams work with both internal protocols and public clinical trial registries. Web search enables ClinicalTrials.gov lookups. Document extraction helps process regulatory correspondence.
-
Manufacturing / CMC
- Models: All available models
- Knowledge bases: Batch records, process validation reports, equipment SOPs
- Permissions: Open Terminal enabled, file upload enabled
- Rationale: CMC scientists frequently upload batch records and deviation reports for AI-assisted root cause analysis. Open Terminal enables batch trend analysis and process parameter visualization.
-
Medical Affairs
- Models: All available models
- Knowledge bases: Product monographs, congress abstracts, KOL slide decks
- Permissions: Web search enabled, file upload enabled
- Rationale: Medical Affairs teams need access to public literature and congress proceedings alongside internal medical information.
-
Pharmacovigilance
- Models: Reasoning models only (e.g., Llama 3.1 70B via vLLM)
- Knowledge bases: MedDRA dictionaries, CIOMS forms, signal detection SOPs
- Permissions: RAG-only mode (no web search, no file upload)
- Rationale: PV work is safety-critical. Restricting to RAG-only mode prioritizes retrieval from curated internal documents and disables web search, reducing exposure to uncontrolled external content. The underlying model may still draw on its training data.
-
R&D / Discovery
- Models: All available models
- Knowledge bases: Compound libraries, assay protocols, literature databases
- Permissions: Open Terminal enabled, code interpreter enabled, file upload enabled, web search enabled
- Rationale: Discovery scientists need the broadest toolset - running SAR analyses and molecular modeling in Open Terminal, uploading proprietary assay results, and searching public literature.
-
Regulatory Affairs
- Models: All available models
- Knowledge bases: eCTD templates, FDA/EMA guidance, precedent correspondence
- Permissions: Document extraction enabled, file upload enabled
- Rationale: Regulatory scientists frequently need to extract structured data from FDA letters, EMA assessment reports, and deficiency notices.
-
Support Staff
- Models: Small models only (e.g., Llama 3.1 8B via Ollama)
- Knowledge bases: Company policies, HR procedures, training materials
- Permissions: No file upload, no web search, no terminal access
- Rationale: Minimal access footprint for non-scientific users.
For each model in Admin Panel → Models:
- Set visibility to Private (not Public)
- Under Access Control, add the groups that should have access
- Ensure
BYPASS_MODEL_ACCESS_CONTROL=Falsein your environment so these restrictions are enforced
For each knowledge base in Admin Panel → Knowledge:
- Set access control to the relevant functional groups
- Users will only see knowledge bases assigned to their group(s) in the chat interface
The Inline Visualizer plugin renders interactive HTML/SVG visualizations directly in the chat. It includes a theme-aware design system with color ramps, SVG utility classes, and a communication bridge that lets visualizations send prompts back to the chat for conversational exploration.
This plugin has two components:
| Component | File | Install Location |
|---|---|---|
| Tool | tool.py |
Workspace → Tools |
| Skill | SKILL.md |
Workspace → Knowledge → Create Skill |
Install the Tool:
- Copy the contents of
tool.py - In Open WebUI, go to Workspace → Tools → + Create New
- Paste the code and click Save
Install the Skill:
- Copy the contents of
SKILL.md - In Open WebUI, go to Workspace → Knowledge → + Create Skill
- Name it
visualize(this exact name is required) - Paste the contents and click Save
Attach to Models:
- Go to Admin Panel → Models and edit each model that should support visualizations
- Under Tools, enable the Inline Visualizer tool
- Under Skills, attach the visualize skill
- Ensure native function calling is enabled for the model
- Save
Enable Interactive Features (Optional):
- Go to Settings → Interface
- Enable iframe Sandbox Allow Same Origin
Without this, visualizations render normally but interactive buttons that send prompts back to the chat (sendPrompt) will not work.
Tip: A strong model is required for complex, visually detailed interactive visualizations. Tested with Claude Haiku 4.5 and Claude Opus 4.5.
Open WebUI's RAG system ingests documents and creates searchable vector embeddings in PGVector. This section provides an example knowledge base design for pharmaceutical contexts.
| Knowledge Base | Contents | Functional Groups | Notes |
|---|---|---|---|
Compound Library |
Structures, SAR data, screening results, MoA summaries | R&D / Discovery | High sensitivity - restrict strictly to R&D |
Assay Protocols |
Standard assay procedures, validation data, reference standards | R&D / Discovery | |
Clinical Protocols |
Study protocols, ICH E6/E8/E9 references, SAPs | Clinical Development | |
CRF Templates |
Case report forms, data management plans, reconciliation guides | Clinical Development | |
Statistical Methods |
SAPs, CDISC standards, analysis dataset specifications | Biostatistics | |
Regulatory Guidance |
FDA guidance library, EMA guidelines, ICH harmonized guidelines | Regulatory Affairs | Consider splitting by region (FDA/EMA/PMDA) |
Submission Templates |
eCTD module templates, cover letters, precedent review correspondence | Regulatory Affairs | |
PV Reference |
MedDRA hierarchy, CIOMS forms, signal detection SOPs, PSUR templates | Pharmacovigilance | Reviewed documents only - no drafts |
Manufacturing SOPs |
Batch records, process validation reports, equipment qualification docs | Manufacturing / CMC | |
Medical Information |
Product monographs, SmPCs, congress posters, medical response letters | Medical Affairs | |
Company Policies |
HR handbook, compliance policies, IT security procedures, training guides | All groups |
- Navigate to Workspace → Knowledge → Create Knowledge Base
2. Name the knowledge base and set the access control to the relevant functional group(s)
3. Upload documents (supported formats: PDF, DOCX, TXT, Markdown, HTML, CSV, XLSX, PPTX)
4. Open WebUI automatically:
- Extracts text from uploaded documents
- Chunks the content for optimal retrieval
- Generates vector embeddings and stores them in PGVector
- Users in the assigned groups can now reference this knowledge base in chat by typing
#followed by the knowledge base name
- Regulatory submissions: Large eCTD modules should be split by section (e.g., upload Module 2.5 Quality Overall Summary separately from Module 3.2.P Drug Product). This improves retrieval precision significantly.
- SOPs and batch records: These are typically well-structured documents that RAG handles effectively. Use descriptive filenames that include the SOP number and revision (e.g.,
SOP-MFG-042-Rev3-Tablet-Compression.pdf). - Literature databases: For large literature collections (1,000+ papers), consider organizing into topic-specific knowledge bases rather than one monolithic collection. This lets users target their retrieval.
- Citation verification: RAG provides relevance scores with each retrieved chunk. Scientists must always verify citations against the source - RAG reduces hallucination but does not eliminate it. All AI-generated content must be reviewed by qualified personnel before use in any clinical, regulatory, or safety-critical context. This is especially critical for PV and regulatory use cases.
- Version control: When SOPs are revised or guidance documents update, upload the new version and remove the old one. Knowledge bases can be updated without downtime. Maintain a document version log outside Open WebUI for your QMS.
The Docker Compose stack pulls nomic-embed-text via Ollama for generating embeddings locally. Configure this in Admin Panel → Settings → Documents → Embedding Model.
For higher-quality embeddings (recommended for 10,000+ document deployments), consider using a dedicated embedding endpoint. Set RAG_OPENAI_API_BASE_URL to point to a self-hosted embedding service or use Ollama's built-in embedding support (both options keep data on your infrastructure when configured accordingly).
The following checklist describes operational security measures. This is not a compliance checklist - your organization's quality, security, and compliance teams should determine which items apply to your environment and what additional measures are needed.
- TLS 1.2+ enforced on the reverse proxy - no plaintext HTTP traffic reaches Open WebUI
- Only port 443 is exposed to the user network; all other services are on internal Docker network
- HSTS header set with
max-age=63072000; includeSubDomains - Security headers:
X-Content-Type-Options: nosniff,X-Frame-Options: SAMEORIGIN,Referrer-Policy: strict-origin-when-cross-origin - Rate limiting configured on the reverse proxy to prevent abuse
- DNS resolves only to the reverse proxy - no direct access to application nodes
-
ENABLE_SIGNUP=False- no self-registration -
DEFAULT_USER_ROLE=pending- new SSO users require admin approval - SSO/OIDC configured with your organization's identity provider
-
ENABLE_OAUTH_GROUP_MANAGEMENT=True- groups sync from IdP -
ENABLE_OAUTH_ROLE_MANAGEMENT=True- roles sync from IdP -
BYPASS_MODEL_ACCESS_CONTROL=False- RBAC enforced on model access -
BYPASS_ADMIN_ACCESS_CONTROL=False- admins subject to workspace ACLs
-
ENABLE_ADMIN_CHAT_ACCESS=False- restricts IT administrators from viewing user conversation content at the application level -
ENABLE_ADMIN_EXPORT=False- disables bulk data extraction at the application level -
USER_PERMISSIONS_CHAT_DELETE=False- disables chat deletion at the application level -
USER_PERMISSIONS_CHAT_TEMPORARY=False- no unlogged conversations -
ENABLE_COMMUNITY_SHARING=False- no external data sharing - PostgreSQL configured with encryption at rest (transparent data encryption or full-disk encryption on the host)
- Redis
requirepassset if Redis is network-accessible (not needed when Redis is internal-only via Docker network) - Backup encryption enabled (see Backup & Disaster Recovery)
- When configured for local-only inference, all models run via Ollama or vLLM on your infrastructure
- Hugging Face token is stored only in
.env, not committed to version control -
.envfile has restrictive permissions:chmod 600 .env - For production deployments, consider migrating secrets from
.envto a dedicated secrets manager (e.g., HashiCorp Vault, AWS Secrets Manager) - vLLM API key (
VLLM_API_KEY) is set to prevent unauthorized direct access to the inference endpoint - Docker image tags pinned to specific versions (not
:mainor:latest) for reproducible, auditable deployments - If Functions are used: LLM-Guard or equivalent function installed for prompt injection scanning
-
OPEN_TERMINAL_API_KEYis set — without it, anyone who can reach the port has full access - Open Terminal container is on the internal Docker network only — not exposed to external traffic
- Resource limits applied:
memory: 4Gandcpus: 4.0(or appropriate for your environment) -
OPEN_TERMINAL_MAX_SESSIONS=16to prevent resource exhaustion from concurrent terminal sessions - Docker socket is not mounted (
/var/run/docker.sock) — unless explicitly required and the environment is trusted - Named volume mounted at
/home/userfor file persistence across container restarts - Open Terminal access restricted to appropriate functional groups via Admin Settings → Integrations
-
ENABLE_DB_MIGRATIONS=Trueon exactly one node;Falseon all others - Redis
maxclientsset to 10000+ andtimeoutset to 1800 - Log aggregation configured (OpenTelemetry, Splunk, Datadog, or equivalent)
- Alerting on container restarts, database connection failures, and GPU memory exhaustion
- Docker image tags pinned to specific versions in production (not
:mainor:latest) - Regular security updates for base images (
docker compose pull && docker compose up -d)
| Component | Data Location | Backup Method |
|---|---|---|
| PostgreSQL | postgres-data volume |
pg_dump (logical) or continuous WAL archiving |
| Redis | redis-data volume |
AOF + RDB snapshots (handled by Redis config) |
| Ollama models | ollama-data volume |
Volume snapshot or re-pull (models are public) |
| Open WebUI data | owui-data volume |
Volume snapshot |
| Open Terminal data | terminal-data volume |
Volume snapshot (or ephemeral — rebuild on demand) |
| Configuration | .env, nginx/, docker-compose.yml |
Git repository (exclude secrets) |
| TLS certificates | nginx/certs/ |
Certificate management system |
Add this to your crontab (crontab -e) or scheduling system:
#!/usr/bin/env bash
# Daily PostgreSQL backup - run via cron at 02:00 UTC
# 0 2 * * * /opt/openwebui/backup-postgres.sh
set -euo pipefail
BACKUP_DIR="/opt/openwebui/backups"
RETENTION_DAYS=30
TIMESTAMP=$(date -u +"%Y%m%d_%H%M%S")
BACKUP_FILE="${BACKUP_DIR}/openwebui_${TIMESTAMP}.sql.gz"
mkdir -p "${BACKUP_DIR}"
docker compose exec -T postgres pg_dump \
-U "${POSTGRES_USER:-openwebui}" \
-d "${POSTGRES_DB:-openwebui}" \
--format=custom \
--compress=9 \
> "${BACKUP_FILE}"
# Verify backup integrity (pg_restore runs on the host against the host-side file)
pg_restore --list "${BACKUP_FILE}" > /dev/null 2>&1 \
&& echo "[OK] Backup verified: ${BACKUP_FILE}" \
|| echo "[ERROR] Backup verification failed: ${BACKUP_FILE}"
# Prune old backups
find "${BACKUP_DIR}" -name "openwebui_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
echo "[INFO] Backup complete. Size: $(du -h "${BACKUP_FILE}" | cut -f1)"- Stop Open WebUI nodes:
docker compose stop open-webui-1 open-webui-2 - Restore PostgreSQL:
docker compose exec -T postgres pg_restore -U openwebui -d openwebui --clean < backup.sql.gz - Restart:
docker compose up -d - Verify: Check health endpoints and run a test query
| Scenario | RPO (Data Loss) | RTO (Downtime) |
|---|---|---|
| Single node failure | 0 (stateless, auto-recovered) | < 30 seconds (health check interval) |
| Database corruption | ≤ 24 hours (daily backups) | < 1 hour |
| Full infrastructure loss | ≤ 24 hours | 2–4 hours (restore from backups) |
| With WAL archiving | ≤ 5 minutes | < 1 hour |
For mission-critical deployments, enable PostgreSQL WAL archiving for point-in-time recovery with an RPO of minutes rather than hours.
This guide is maintained alongside What Would It Take for a Pharma Company to Run AI On Its Own Infrastructure?. For questions about enterprise deployment, contact sales@openwebui.com.



