Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
8b12cbf
feat: update staging compose for demo/preview deployments
mihow Mar 18, 2026
b3d9771
fix: upgrade gunicorn 20.1.0 → 23.0.0
mihow Mar 18, 2026
1bb2646
fix: address PR review feedback
mihow Mar 18, 2026
d5e8f6f
fix: remove host port bindings from internal services
mihow Mar 18, 2026
7f75f98
docs: add staging deployment guide
mihow Mar 18, 2026
6b699b7
fix: increase DATA_UPLOAD_MAX_MEMORY_SIZE for ML worker results
mihow Mar 24, 2026
91a7f49
fix(celery): use Redis DB 1 for result backend, separate from cache
mihow Mar 25, 2026
c845143
docs(celery): add comments explaining CELERY_RESULT_EXTENDED impact
mihow Mar 25, 2026
14c26f5
fix(redis): add redis.conf, disable bgsave, add CELERY_RESULT_EXPIRES
mihow Mar 26, 2026
bd649a9
Merge branch 'main' into feat/update-staging-compose
mihow Apr 1, 2026
f12610f
chore(staging): add .compose-example and deploy.sh script
mihow Apr 1, 2026
fea3af2
fix(settings): use urllib.parse for Redis DB URL rewriting
mihow Apr 1, 2026
87b8c12
docs(staging): add reverse proxy section with nginx example
mihow Apr 1, 2026
1f2db69
fix(staging): deploy.sh should pull, not just fetch
mihow Apr 1, 2026
47e9fdd
fix(staging): add branch/host echo to deploy.sh before deploying
mihow Apr 1, 2026
095c6fa
docs(staging): clarify staging vs production, explain .envs/.producti…
mihow Apr 1, 2026
9a3b020
docs(staging): note potential rename to demo in future release
mihow Apr 1, 2026
076258f
fix: sort stdlib imports in base.py (isort)
mihow Apr 1, 2026
d0a8f1d
fix: unused import
mihow Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions compose/staging/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Staging Deployment

Deploy the Antenna platform with local Redis, RabbitMQ, and NATS containers.
The database is always external — either a dedicated server, a managed service,
or the optional local Postgres container included here.

## Quick Start (single instance)

### 1. Configure environment files

Copy the examples and fill in the values:

```bash
# Django settings
cp .envs/.production/.django-example .envs/.production/.django

# Database credentials
cat > .envs/.production/.postgres << 'EOF'
POSTGRES_HOST=db
POSTGRES_PORT=5432
POSTGRES_DB=antenna_staging
POSTGRES_USER=antenna
POSTGRES_PASSWORD=<generate-a-password>
EOF

# Database host IP
cat > .envs/.production/.compose << 'EOF'
DATABASE_IP=host-gateway
EOF
```

Key settings to configure in `.envs/.production/.django`:

| Variable | Example | Notes |
|---|---|---|
| `DJANGO_SECRET_KEY` | `<random-string>` | Generate with `python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"` |
| `DJANGO_ALLOWED_HOSTS` | `*` or `api.staging.example.com` | |
| `REDIS_URL` | `redis://redis:6379/0` | Always use `redis` hostname (local container) |
| `CELERY_BROKER_URL` | `amqp://antenna:password@rabbitmq:5672/` | Always use `rabbitmq` hostname |
| `RABBITMQ_DEFAULT_USER` | `antenna` | Must match the user in `CELERY_BROKER_URL` |
| `RABBITMQ_DEFAULT_PASS` | `<password>` | Must match the password in `CELERY_BROKER_URL` |
| `NATS_URL` | `nats://nats:4222` | Always use `nats` hostname |
| `CELERY_FLOWER_USER` | `flower` | Basic auth for the Flower web UI |
| `CELERY_FLOWER_PASSWORD` | `<password>` | |
| `SENDGRID_API_KEY` | `placeholder` | Set a real key to enable email, or any non-empty string to skip |
| `DJANGO_AWS_STORAGE_BUCKET_NAME` | `my-bucket` | S3-compatible object storage for media/static files |
| `DJANGO_SUPERUSER_EMAIL` | `admin@example.com` | Used by `create_demo_project` command |
| `DJANGO_SUPERUSER_PASSWORD` | `<password>` | Used by `create_demo_project` command |

### 2. Start the database

If you have an external database, set `DATABASE_IP` in `.envs/.production/.compose`
to its IP address and skip this step.

For a local database container:

```bash
docker compose -f compose/staging/docker-compose.db.yml up -d

# Set DATABASE_IP to reach the host-published port from app containers
echo "DATABASE_IP=host-gateway" > .envs/.production/.compose
```

Verify the database is ready:

```bash
docker compose -f compose/staging/docker-compose.db.yml logs
# Should show: "database system is ready to accept connections"
```

### 3. Build and start the app

```bash
docker compose -f docker-compose.staging.yml \
--env-file .envs/.production/.compose build django

docker compose -f docker-compose.staging.yml \
--env-file .envs/.production/.compose up -d
```

### 4. Run migrations and create an admin user

```bash
# Shorthand for the compose command
COMPOSE="docker compose -f docker-compose.staging.yml --env-file .envs/.production/.compose"

# Apply database migrations
$COMPOSE run --rm django python manage.py migrate

# Create demo project with sample data and admin user
$COMPOSE run --rm django python manage.py create_demo_project

# Or just create an admin user without sample data
$COMPOSE run --rm django python manage.py createsuperuser --noinput
```

### 5. Verify

```bash
# API root
curl http://localhost:5001/api/v2/

# Django admin
# Open http://localhost:5001/admin/ in a browser

# Flower (Celery monitoring)
# Open http://localhost:5550/ in a browser

# NATS health (internal, but reachable via docker exec)
docker compose -f docker-compose.staging.yml \
--env-file .envs/.production/.compose \
exec nats wget -qO- http://localhost:8222/healthz
```

## Multiple Instances on the Same Host

Internal services (Redis, RabbitMQ, NATS) don't publish host ports, so they
never conflict between instances. Each compose project gets its own isolated
Docker network.

Only Django and Flower publish host ports. Override them with environment
variables and use a unique project name (`-p`):

```bash
# Instance A (defaults: Django on 5001, Flower on 5550)
docker compose -p antenna-main \
-f docker-compose.staging.yml \
--env-file .envs/.production/.compose up -d

# Instance B (custom ports)
DJANGO_PORT=5002 FLOWER_PORT=5551 \
docker compose -p antenna-feature-xyz \
-f docker-compose.staging.yml \
--env-file path/to/other/.compose up -d
```

Each instance needs its own:
- `.envs/.production/.compose` (can share `DATABASE_IP` if using the same DB server)
- `.envs/.production/.postgres` (use a different `POSTGRES_DB` per instance)
- `.envs/.production/.django` (can share most settings, but use unique `DJANGO_SECRET_KEY`)

If using the local database container, each instance needs its own DB container
too (or share one by creating multiple databases in it).

## Stopping and Cleaning Up

```bash
# Stop the app stack
docker compose -f docker-compose.staging.yml \
--env-file .envs/.production/.compose down

# Stop the local database (data is preserved in a Docker volume)
docker compose -f compose/staging/docker-compose.db.yml down

# Remove everything including database data
docker compose -f compose/staging/docker-compose.db.yml down -v
```

## Database Options

The staging compose supports any PostgreSQL database reachable by IP:

| Option | `DATABASE_IP` | Notes |
|---|---|---|
| Local container | `host-gateway` | Use `compose/staging/docker-compose.db.yml` |
| Dedicated VM | `<server-ip>` | Best performance for shared environments |
| Managed service | `<service-ip>` | Cloud-hosted PostgreSQL |

Set `POSTGRES_HOST=db` in `.envs/.production/.postgres` — the `extra_hosts`
directive in the compose file maps `db` to whatever `DATABASE_IP` resolves to.
38 changes: 38 additions & 0 deletions compose/staging/docker-compose.db.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Optional local PostgreSQL for staging environments.
#
# Use this when you don't have an external database (e.g., for local testing
# or isolated branch previews). Publishes PostgreSQL on localhost:5432.
#
# Usage:
# # Start the database first
# docker compose -f compose/staging/docker-compose.db.yml up -d
#
# # Then start the app stack
# docker compose -f docker-compose.staging.yml --env-file .envs/.production/.compose up -d
#
# The app connects to the database via extra_hosts (db → DATABASE_IP).
# Set DATABASE_IP to the Docker bridge gateway so the app container can
# reach the host-published port:
#
# .envs/.production/.compose:
# DATABASE_IP=host-gateway # Recommended (resolves to host on all platforms)
#
# .envs/.production/.postgres:
# POSTGRES_HOST=db # resolves via extra_hosts to DATABASE_IP

volumes:
staging_postgres_data: {}

services:
postgres:
build:
context: ../../
dockerfile: ./compose/local/postgres/Dockerfile
volumes:
- staging_postgres_data:/var/lib/postgresql/data
- ../../data/db/snapshots:/backups
env_file:
- ../../.envs/.production/.postgres
ports:
- "127.0.0.1:5432:5432"
restart: always
38 changes: 38 additions & 0 deletions compose/staging/redis.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Redis configuration for staging/demo environments
#
# This is a minimal config for Redis running as a Docker container alongside
# the app. A production deployment would typically use a separate Redis server
# with its own config tuned for the available resources.
#
# Redis DB layout (configured in Django settings, not here):
# DB 0: Django cache (disposable, can be flushed anytime)
# DB 1: Celery task result metadata (auto-expires via CELERY_RESULT_EXPIRES)
#
# Celery result key sizes vary widely depending on the task. ML pipeline result
# tasks (process_nats_pipeline_result) store full detection/classification JSON
# when CELERY_RESULT_EXTENDED=True. Measured on a demo instance (2026-03-26):
# Median: 5 KB, Avg: 191 KB, Max: 2.1 MB per key
# A job processing ~2,500 images can produce ~480 MB of result keys
#
# The role of Redis in this stack is still being evaluated — it may be reduced
# to cache-only or removed for Celery entirely. See issue #1189.

# Memory limit. Adjust based on available RAM. A production server with more
# memory might use a higher limit.
maxmemory 8gb

# Eviction policy. allkeys-lru evicts the least-recently-used key from any DB
# when maxmemory is reached. This works when all data is regenerable (cache)
# or has TTLs (results). If mixing persistent and ephemeral data, consider
# volatile-ttl or separate Redis instances per concern.
maxmemory-policy allkeys-lru

# Disable RDB persistence. Staging/demo data is disposable and bgsave of large
# datasets can saturate disk I/O on small volumes. A production deployment
# with adequate disk should consider enabling RDB snapshots for durability.
save ""

# Network timeouts. Tune based on network conditions — longer keepalive
# intervals may be needed for connections over unreliable networks.
tcp-keepalive 60
timeout 120
39 changes: 37 additions & 2 deletions config/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Base settings to build other settings files upon.
"""

import re
import socket
from pathlib import Path

Expand Down Expand Up @@ -263,6 +264,21 @@
}
REDIS_URL = env("REDIS_URL", default=None)


# Derive a separate Redis DB for Celery results (DB 1) from REDIS_URL (DB 0).
# This keeps Django cache (DB 0) and Celery task metadata (DB 1) isolated so they
# can be flushed and monitored independently.
# TODO: consider separate Redis instances with different eviction policies:
# allkeys-lru for cache, volatile-ttl for results. See issue #1189.
def _celery_result_backend_url(redis_url):
if not redis_url:
return None
# Replace the DB number at the end of the URL (e.g. /0 -> /1)
return re.sub(r"/\d+$", "/1", redis_url) if "/" in redis_url.split(":")[-1] else redis_url + "/1"


CELERY_RESULT_BACKEND_URL = env("CELERY_RESULT_BACKEND", default=None) or _celery_result_backend_url(REDIS_URL)

# NATS
# ------------------------------------------------------------------------------
NATS_URL = env("NATS_URL", default="nats://localhost:4222") # type: ignore[no-untyped-call]
Expand Down Expand Up @@ -310,15 +326,31 @@
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-broker_url
CELERY_BROKER_URL = env("CELERY_BROKER_URL")
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-result_backend
# "rpc://" means use RabbitMQ for results backend by default
CELERY_RESULT_BACKEND = env("CELERY_RESULT_BACKEND", default="rpc://") # type: ignore[no-untyped-call]
# Use Redis DB 1 for results (separate from cache on DB 0).
# Falls back to CELERY_RESULT_BACKEND env var if explicitly set, otherwise derives from REDIS_URL.
# See issue #1189 for discussion of result backend architecture.
CELERY_RESULT_BACKEND = CELERY_RESULT_BACKEND_URL or "rpc://"
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-extended
# Stores full task args/kwargs/name in the result backend alongside status.
# Useful for: inspecting task arguments in Flower, debugging failed tasks,
# post-hoc analysis of what data a task received.
# Cost: result keys are large because process_nats_pipeline_result receives the
# full ML result JSON as args. Measured on demo (298 keys, 2026-03-26):
# Median: 5 KB, Avg: 191 KB, Max: 2.1 MB per key
# Distribution: 29 <1KB, 195 1-10KB, 52 100KB-1MB, 22 >1MB
# With thousands of tasks per job, this adds significant memory pressure.
# TODO: consider disabling this or setting ignore_result=True on bulk tasks
# like process_nats_pipeline_result to reduce result backend load. See #1189.
CELERY_RESULT_EXTENDED = True
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-backend-always-retry
# https://github.com/celery/celery/pull/6122
CELERY_RESULT_BACKEND_ALWAYS_RETRY = True
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-backend-max-retries
CELERY_RESULT_BACKEND_MAX_RETRIES = 10
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-result_expires
# Auto-expire task results after 72 hours. Keeps results available for inspection
# and troubleshooting while preventing unbounded growth. Override via env var (seconds).
CELERY_RESULT_EXPIRES = int(env("CELERY_RESULT_EXPIRES", default="259200")) # type: ignore[no-untyped-call]
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-accept_content
CELERY_ACCEPT_CONTENT = ["json"]
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-task_serializer
Expand Down Expand Up @@ -386,6 +418,9 @@
CELERY_BROKER_CONNECTION_MAX_RETRIES = None # Retry forever


# Allow large request bodies from ML workers posting classification results
DATA_UPLOAD_MAX_MEMORY_SIZE = 100 * 1024 * 1024 # 100MB (default 2.5MB)

# django-rest-framework
# -------------------------------------------------------------------------------
# django-rest-framework - https://www.django-rest-framework.org/api-guide/settings/
Expand Down
Loading