RolnickLab · mihow · Apr 1, 2026 · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026
diff --git a/compose/staging/README.md b/compose/staging/README.md
@@ -0,0 +1,170 @@
+# Staging Deployment
+
+Deploy the Antenna platform with local Redis, RabbitMQ, and NATS containers.
+The database is always external — either a dedicated server, a managed service,
+or the optional local Postgres container included here.
+
+## Quick Start (single instance)
+
+### 1. Configure environment files
+
+Copy the examples and fill in the values:
+
+```bash
+# Django settings
+cp .envs/.production/.django-example .envs/.production/.django
+
+# Database credentials
+cat > .envs/.production/.postgres << 'EOF'
+POSTGRES_HOST=db
+POSTGRES_PORT=5432
+POSTGRES_DB=antenna_staging
+POSTGRES_USER=antenna
+POSTGRES_PASSWORD=<generate-a-password>
+EOF
+
+# Database host IP
+cat > .envs/.production/.compose << 'EOF'
+DATABASE_IP=host-gateway
+EOF
+```
+
+Key settings to configure in `.envs/.production/.django`:
+
+| Variable | Example | Notes |
+|---|---|---|
+| `DJANGO_SECRET_KEY` | `<random-string>` | Generate with `python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"` |
+| `DJANGO_ALLOWED_HOSTS` | `*` or `api.staging.example.com` | |
+| `REDIS_URL` | `redis://redis:6379/0` | Always use `redis` hostname (local container) |
+| `CELERY_BROKER_URL` | `amqp://antenna:password@rabbitmq:5672/` | Always use `rabbitmq` hostname |
+| `RABBITMQ_DEFAULT_USER` | `antenna` | Must match the user in `CELERY_BROKER_URL` |
+| `RABBITMQ_DEFAULT_PASS` | `<password>` | Must match the password in `CELERY_BROKER_URL` |
+| `NATS_URL` | `nats://nats:4222` | Always use `nats` hostname |
+| `CELERY_FLOWER_USER` | `flower` | Basic auth for the Flower web UI |
+| `CELERY_FLOWER_PASSWORD` | `<password>` | |
+| `SENDGRID_API_KEY` | `placeholder` | Set a real key to enable email, or any non-empty string to skip |
+| `DJANGO_AWS_STORAGE_BUCKET_NAME` | `my-bucket` | S3-compatible object storage for media/static files |
+| `DJANGO_SUPERUSER_EMAIL` | `admin@example.com` | Used by `create_demo_project` command |
+| `DJANGO_SUPERUSER_PASSWORD` | `<password>` | Used by `create_demo_project` command |
+
+### 2. Start the database
+
+If you have an external database, set `DATABASE_IP` in `.envs/.production/.compose`
+to its IP address and skip this step.
+
+For a local database container:
+
+```bash
+docker compose -f compose/staging/docker-compose.db.yml up -d
+
+# Set DATABASE_IP to reach the host-published port from app containers
+echo "DATABASE_IP=host-gateway" > .envs/.production/.compose
+```
+
+Verify the database is ready:
+
+```bash
+docker compose -f compose/staging/docker-compose.db.yml logs
+# Should show: "database system is ready to accept connections"
+```
+
+### 3. Build and start the app
+
+```bash
+docker compose -f docker-compose.staging.yml \
+  --env-file .envs/.production/.compose build django
+
+docker compose -f docker-compose.staging.yml \
+  --env-file .envs/.production/.compose up -d
+```
+
+### 4. Run migrations and create an admin user
+
+```bash
+# Shorthand for the compose command
+COMPOSE="docker compose -f docker-compose.staging.yml --env-file .envs/.production/.compose"
+
+# Apply database migrations
+$COMPOSE run --rm django python manage.py migrate
+
+# Create demo project with sample data and admin user
+$COMPOSE run --rm django python manage.py create_demo_project
+
+# Or just create an admin user without sample data
+$COMPOSE run --rm django python manage.py createsuperuser --noinput
+```
+
+### 5. Verify
+
+```bash
+# API root
+curl http://localhost:5001/api/v2/
+
+# Django admin
+# Open http://localhost:5001/admin/ in a browser
+
+# Flower (Celery monitoring)
+# Open http://localhost:5550/ in a browser
+
+# NATS health (internal, but reachable via docker exec)
+docker compose -f docker-compose.staging.yml \
+  --env-file .envs/.production/.compose \
+  exec nats wget -qO- http://localhost:8222/healthz
+```
+
+## Multiple Instances on the Same Host
+
+Internal services (Redis, RabbitMQ, NATS) don't publish host ports, so they
+never conflict between instances. Each compose project gets its own isolated
+Docker network.
+
+Only Django and Flower publish host ports. Override them with environment
+variables and use a unique project name (`-p`):
+
+```bash
+# Instance A (defaults: Django on 5001, Flower on 5550)
+docker compose -p antenna-main \
+  -f docker-compose.staging.yml \
+  --env-file .envs/.production/.compose up -d
+
+# Instance B (custom ports)
+DJANGO_PORT=5002 FLOWER_PORT=5551 \
+  docker compose -p antenna-feature-xyz \
+  -f docker-compose.staging.yml \
+  --env-file path/to/other/.compose up -d
+```
+
+Each instance needs its own:
+- `.envs/.production/.compose` (can share `DATABASE_IP` if using the same DB server)
+- `.envs/.production/.postgres` (use a different `POSTGRES_DB` per instance)
+- `.envs/.production/.django` (can share most settings, but use unique `DJANGO_SECRET_KEY`)
+
+If using the local database container, each instance needs its own DB container
+too (or share one by creating multiple databases in it).
+
+## Stopping and Cleaning Up
+
+```bash
+# Stop the app stack
+docker compose -f docker-compose.staging.yml \
+  --env-file .envs/.production/.compose down
+
+# Stop the local database (data is preserved in a Docker volume)
+docker compose -f compose/staging/docker-compose.db.yml down
+
+# Remove everything including database data
+docker compose -f compose/staging/docker-compose.db.yml down -v
+```
+
+## Database Options
+
+The staging compose supports any PostgreSQL database reachable by IP:
+
+| Option | `DATABASE_IP` | Notes |
+|---|---|---|
+| Local container | `host-gateway` | Use `compose/staging/docker-compose.db.yml` |
+| Dedicated VM | `<server-ip>` | Best performance for shared environments |
+| Managed service | `<service-ip>` | Cloud-hosted PostgreSQL |
+
+Set `POSTGRES_HOST=db` in `.envs/.production/.postgres` — the `extra_hosts`
+directive in the compose file maps `db` to whatever `DATABASE_IP` resolves to.
diff --git a/compose/staging/docker-compose.db.yml b/compose/staging/docker-compose.db.yml
@@ -0,0 +1,38 @@
+# Optional local PostgreSQL for staging environments.
+#
+# Use this when you don't have an external database (e.g., for local testing
+# or isolated branch previews). Publishes PostgreSQL on localhost:5432.
+#
+# Usage:
+#   # Start the database first
+#   docker compose -f compose/staging/docker-compose.db.yml up -d
+#
+#   # Then start the app stack
+#   docker compose -f docker-compose.staging.yml --env-file .envs/.production/.compose up -d
+#
+# The app connects to the database via extra_hosts (db → DATABASE_IP).
+# Set DATABASE_IP to the Docker bridge gateway so the app container can
+# reach the host-published port:
+#
+#   .envs/.production/.compose:
+#     DATABASE_IP=host-gateway      # Recommended (resolves to host on all platforms)
+#
+#   .envs/.production/.postgres:
+#     POSTGRES_HOST=db              # resolves via extra_hosts to DATABASE_IP
+
+volumes:
+  staging_postgres_data: {}
+
+services:
+  postgres:
+    build:
+      context: ../../
+      dockerfile: ./compose/local/postgres/Dockerfile
+    volumes:
+      - staging_postgres_data:/var/lib/postgresql/data
+      - ../../data/db/snapshots:/backups
+    env_file:
+      - ../../.envs/.production/.postgres
+    ports:
+      - "127.0.0.1:5432:5432"
+    restart: always
diff --git a/compose/staging/redis.conf b/compose/staging/redis.conf
@@ -0,0 +1,38 @@
+# Redis configuration for staging/demo environments
+#
+# This is a minimal config for Redis running as a Docker container alongside
+# the app. A production deployment would typically use a separate Redis server
+# with its own config tuned for the available resources.
+#
+# Redis DB layout (configured in Django settings, not here):
+#   DB 0: Django cache (disposable, can be flushed anytime)
+#   DB 1: Celery task result metadata (auto-expires via CELERY_RESULT_EXPIRES)
+#
+# Celery result key sizes vary widely depending on the task. ML pipeline result
+# tasks (process_nats_pipeline_result) store full detection/classification JSON
+# when CELERY_RESULT_EXTENDED=True. Measured on a demo instance (2026-03-26):
+#   Median: 5 KB, Avg: 191 KB, Max: 2.1 MB per key
+#   A job processing ~2,500 images can produce ~480 MB of result keys
+#
+# The role of Redis in this stack is still being evaluated — it may be reduced
+# to cache-only or removed for Celery entirely. See issue #1189.
+
+# Memory limit. Adjust based on available RAM. A production server with more
+# memory might use a higher limit.
+maxmemory 8gb
+
+# Eviction policy. allkeys-lru evicts the least-recently-used key from any DB
+# when maxmemory is reached. This works when all data is regenerable (cache)
+# or has TTLs (results). If mixing persistent and ephemeral data, consider
+# volatile-ttl or separate Redis instances per concern.
+maxmemory-policy allkeys-lru
+
+# Disable RDB persistence. Staging/demo data is disposable and bgsave of large
+# datasets can saturate disk I/O on small volumes. A production deployment
+# with adequate disk should consider enabling RDB snapshots for durability.
+save ""
+
+# Network timeouts. Tune based on network conditions — longer keepalive
+# intervals may be needed for connections over unreliable networks.
+tcp-keepalive 60
+timeout 120
diff --git a/config/settings/base.py b/config/settings/base.py
@@ -2,6 +2,7 @@
 Base settings to build other settings files upon.
 """
 
+import re
 import socket
 from pathlib import Path
 
@@ -263,6 +264,21 @@
 }
 REDIS_URL = env("REDIS_URL", default=None)
 
+
+# Derive a separate Redis DB for Celery results (DB 1) from REDIS_URL (DB 0).
+# This keeps Django cache (DB 0) and Celery task metadata (DB 1) isolated so they
+# can be flushed and monitored independently.
+# TODO: consider separate Redis instances with different eviction policies:
+#   allkeys-lru for cache, volatile-ttl for results. See issue #1189.
+def _celery_result_backend_url(redis_url):
+    if not redis_url:
+        return None
+    # Replace the DB number at the end of the URL (e.g. /0 -> /1)
+    return re.sub(r"/\d+$", "/1", redis_url) if "/" in redis_url.split(":")[-1] else redis_url + "/1"
+
+
+CELERY_RESULT_BACKEND_URL = env("CELERY_RESULT_BACKEND", default=None) or _celery_result_backend_url(REDIS_URL)
+
 # NATS
 # ------------------------------------------------------------------------------
 NATS_URL = env("NATS_URL", default="nats://localhost:4222")  # type: ignore[no-untyped-call]
@@ -310,15 +326,31 @@
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-broker_url
 CELERY_BROKER_URL = env("CELERY_BROKER_URL")
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-result_backend
-# "rpc://" means use RabbitMQ for results backend by default
-CELERY_RESULT_BACKEND = env("CELERY_RESULT_BACKEND", default="rpc://")  # type: ignore[no-untyped-call]
+# Use Redis DB 1 for results (separate from cache on DB 0).
+# Falls back to CELERY_RESULT_BACKEND env var if explicitly set, otherwise derives from REDIS_URL.
+# See issue #1189 for discussion of result backend architecture.
+CELERY_RESULT_BACKEND = CELERY_RESULT_BACKEND_URL or "rpc://"
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-extended
+# Stores full task args/kwargs/name in the result backend alongside status.
+# Useful for: inspecting task arguments in Flower, debugging failed tasks,
+# post-hoc analysis of what data a task received.
+# Cost: result keys are large because process_nats_pipeline_result receives the
+# full ML result JSON as args. Measured on demo (298 keys, 2026-03-26):
+#   Median: 5 KB, Avg: 191 KB, Max: 2.1 MB per key
+#   Distribution: 29 <1KB, 195 1-10KB, 52 100KB-1MB, 22 >1MB
+# With thousands of tasks per job, this adds significant memory pressure.
+# TODO: consider disabling this or setting ignore_result=True on bulk tasks
+# like process_nats_pipeline_result to reduce result backend load. See #1189.
 CELERY_RESULT_EXTENDED = True
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-backend-always-retry
 # https://github.com/celery/celery/pull/6122
 CELERY_RESULT_BACKEND_ALWAYS_RETRY = True
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#result-backend-max-retries
 CELERY_RESULT_BACKEND_MAX_RETRIES = 10
+# https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-result_expires
+# Auto-expire task results after 72 hours. Keeps results available for inspection
+# and troubleshooting while preventing unbounded growth. Override via env var (seconds).
+CELERY_RESULT_EXPIRES = int(env("CELERY_RESULT_EXPIRES", default="259200"))  # type: ignore[no-untyped-call]
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-accept_content
 CELERY_ACCEPT_CONTENT = ["json"]
 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#std:setting-task_serializer
@@ -386,6 +418,9 @@
 CELERY_BROKER_CONNECTION_MAX_RETRIES = None  # Retry forever
 
 
+# Allow large request bodies from ML workers posting classification results
+DATA_UPLOAD_MAX_MEMORY_SIZE = 100 * 1024 * 1024  # 100MB (default 2.5MB)
+
 # django-rest-framework
 # -------------------------------------------------------------------------------
 # django-rest-framework - https://www.django-rest-framework.org/api-guide/settings/