Update staging compose for demo/preview deployments#1181
Conversation
Update docker-compose.staging.yml to serve as the standard config for staging, demo, and branch preview environments: - Remove local Postgres (DB is always external via DATABASE_IP) - Add RabbitMQ container for Celery task broker - Add NATS container (was present but commented out in depends_on) - Add restart:always to all services - Switch from .envs/.local/.postgres to .envs/.production/.postgres - Remove hardcoded container_name on NATS (allows multiple instances) - Remove awscli service (backups handled by TeamCity) - RabbitMQ credentials configured via .envs/.production/.django, not hardcoded in compose Add compose/staging/docker-compose.db.yml as an optional convenience for running a local PostgreSQL container when no external DB is available (e.g., ood environment, local testing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gunicorn 20.x requires pkg_resources from setuptools, which was removed in setuptools 82+. Fresh Docker image builds fail with ModuleNotFoundError on startup. gunicorn 23 drops the pkg_resources dependency entirely. Closes #1180 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for antenna-ssec canceled.
|
✅ Deploy Preview for antenna-preview canceled.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a staging compose DB file and Redis config, changes staging compose to run Redis, RabbitMQ, and NATS locally while Django connects to Postgres via Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer / CI
participant Compose as docker-compose (staging)
participant Django as Django container
participant Postgres as Postgres (external or compose.db)
participant Redis as Redis
participant RabbitMQ as RabbitMQ
participant NATS as NATS
participant Celery as Celery workers
Dev->>Compose: docker compose --env-file .envs/.production/.compose up -d --build
Compose->>Redis: start (uses compose/staging/redis.conf)
Compose->>RabbitMQ: start
Compose->>NATS: start (JetStream)
Compose->>Django: start (depends_on: redis, rabbitmq, nats)
Django->>Postgres: connect via DATABASE_IP (or optional compose.db)
Django->>Redis: cache/sessions and derive CELERY_RESULT_BACKEND_URL
Django->>RabbitMQ: publish tasks
RabbitMQ->>Celery: deliver tasks to workers
Django->>NATS: telemetry / pub-sub
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
docker-compose.staging.yml (1)
68-70: Bind RabbitMQ management UI to localhost by default.Line 69 exposes
15672broadly; in staging/demo this is safer as localhost-bound unless remote admin access is explicitly needed.Proposed hardening
ports: - - "15672:15672" + - "127.0.0.1:15672:15672"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.staging.yml` around lines 68 - 70, The ports mapping currently exposes the RabbitMQ management UI publicly via the line with "15672:15672"; change the ports entry so the management port is bound to localhost only (e.g., use a host IP prefix like 127.0.0.1:15672:15672) under the same ports block so the service still restarts as configured (restart: always) but the management UI is only accessible from the host machine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compose/staging/docker-compose.db.yml`:
- Around line 5-17: The comments are ambiguous about DB connectivity modes
(Docker network vs host bridge). Update the header comments in
compose/staging/docker-compose.db.yml to clearly state which mode the stack
expects: if the app uses POSTGRES_HOST=db (container-to-container via Docker
network) remove or reword the DATABASE_IP/host-bridge instructions;
alternatively, if the intended workflow requires the host bridge and
DATABASE_IP, change the POSTGRES_HOST guidance accordingly and explain when to
start the DB with docker compose -f compose/staging/docker-compose.db.yml up -d
vs when to set DATABASE_IP for the app compose. Ensure you reference
POSTGRES_HOST and DATABASE_IP and mention the two compose files
(compose/staging/docker-compose.db.yml and docker-compose.staging.yml) so
readers know which mode each file supports.
- Around line 35-36: The compose file currently publishes PostgreSQL on all
interfaces via the ports mapping `ports: - "5432:5432"`; change this to bind to
localhost by replacing it with `ports: - "127.0.0.1:5432:5432"` (or remove the
ports mapping entirely and rely on an internal network) so Postgres only listens
on loopback for staging/demo unless external access is explicitly required;
update any documentation or scripts that expect an externally accessible port
accordingly.
In `@docker-compose.staging.yml`:
- Around line 14-15: The rabbitmq service is missing an env_file so it doesn't
pick up RABBITMQ_DEFAULT_USER/RABBITMQ_DEFAULT_PASS from
.envs/.production/.django; update the rabbitmq service definition to add an
env_file pointing to .envs/.production/.django (so the RABBITMQ_DEFAULT_USER and
RABBITMQ_DEFAULT_PASS values are loaded) and remove or override any conflicting
environment: entries if present; ensure the service name "rabbitmq" and the
variables RABBITMQ_DEFAULT_USER / RABBITMQ_DEFAULT_PASS are used consistently
with Django's CELERY_BROKER_URL.
---
Nitpick comments:
In `@docker-compose.staging.yml`:
- Around line 68-70: The ports mapping currently exposes the RabbitMQ management
UI publicly via the line with "15672:15672"; change the ports entry so the
management port is bound to localhost only (e.g., use a host IP prefix like
127.0.0.1:15672:15672) under the same ports block so the service still restarts
as configured (restart: always) but the management UI is only accessible from
the host machine.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3cbcc483-c4ef-463f-8d20-ba97a05053c7
📒 Files selected for processing (3)
compose/staging/docker-compose.db.ymldocker-compose.staging.ymlrequirements/base.txt
There was a problem hiding this comment.
Pull request overview
Updates the staging Docker Compose setup to be a shared baseline for staging/demo/branch-preview deployments by running Redis/RabbitMQ/NATS locally while connecting to an external Postgres via DATABASE_IP, and upgrades Gunicorn to avoid fresh-build failures on slim Python images.
Changes:
- Upgrade
gunicornto23.0.0. - Revise
docker-compose.staging.ymlto remove the local Postgres service and add local RabbitMQ + NATS (with restarts and updated env-file usage). - Add an optional
compose/staging/docker-compose.db.ymlfor running a local Postgres container.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| requirements/base.txt | Bumps Gunicorn to 23.0.0. |
| docker-compose.staging.yml | Reworks staging compose to use external DB + local Redis/RabbitMQ/NATS. |
| compose/staging/docker-compose.db.yml | Adds an optional local Postgres compose for staging-like setups. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add env_file to rabbitmq service so it picks up
RABBITMQ_DEFAULT_USER/RABBITMQ_DEFAULT_PASS from .django env
- Use ${DATABASE_IP:?} required-variable syntax for fail-fast on
missing config
- Bind local Postgres to 127.0.0.1 instead of 0.0.0.0
- Clarify DB compose comments: document host-bridge connectivity
via DATABASE_IP, remove ambiguous "Docker network" wording
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Internal services (Redis, RabbitMQ, NATS) don't need host port exposure — only the app containers talk to them via the Docker network. Removing host ports means multiple instances (branch previews, worktrees) never conflict on these ports. Django and Flower ports are now configurable via DJANGO_PORT and FLOWER_PORT env vars (default 5001 and 5550). Also use host-gateway (works on all platforms) instead of platform-specific Docker bridge IPs in DB compose docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup instructions for single and multi-instance staging deployments, covering environment configuration, database options, migrations, sample data, and port management for running multiple instances on the same host. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ML workers post classification results with up to 29K categories per image, easily exceeding Django's default 2.5MB request body limit. This caused 413 errors on the demo environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The default CELERY_RESULT_BACKEND was "rpc://" which uses RabbitMQ for results. This caused channel exhaustion (65,535 limit), connection resets, and worker crashes on the demo environment. Changes: - Derive CELERY_RESULT_BACKEND from REDIS_URL using DB 1 instead of the cache DB 0. This keeps cache and task results isolated so they can be flushed and monitored independently. - Add maxmemory config to staging Redis (8gb, allkeys-lru) - Falls back to rpc:// only if no REDIS_URL is configured - Env var CELERY_RESULT_BACKEND still overrides if explicitly set Redis DB layout: DB 0: Django cache (disposable, allkeys-lru eviction) DB 1: Celery task result metadata (TTL-based via CELERY_RESULT_EXPIRES) Relates to #1189 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document what CELERY_RESULT_EXTENDED does, why it's expensive (~19KB per task vs ~200B), and note that bulk tasks like process_nats_pipeline_result could use ignore_result=True to avoid storing large ML result JSON in the result backend. Relates to #1189 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aaab7c9 to
1162e6a
Compare
- Move Redis config to compose/staging/redis.conf for clarity - Disable RDB persistence (save "") — bgsave of large datasets saturates disk I/O on small volumes, hanging NATS and other services - Add CELERY_RESULT_EXPIRES=3600 default in base.py to auto-expire task results after 1 hour, preventing unbounded Redis memory growth - Keep maxmemory 8gb and allkeys-lru eviction policy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1162e6a to
14c26f5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docker-compose.staging.yml (1)
70-71: Flower volume mount for persistent data.Docker will create
./data/flower/if it doesn't exist. If Flower writes with a non-root UID, ensure the directory has appropriate permissions or use named volumes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.staging.yml` around lines 70 - 71, The volume mount "- ./data/flower/:/data/" can lead to permission issues if Flower runs as a non-root UID; either ensure the host directory exists and is chowned to the same UID/GID Flower runs as (create the directory and run chown to the container user), or replace the bind mount with a named Docker volume (e.g., declare a named volume like "flower_data" and mount it to /data) and add the named volume under the compose "volumes" section so Docker manages ownership; alternatively set the "user" field on the Flower service to match the host directory ownership if you must keep the bind mount.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/settings/base.py`:
- Around line 273-277: The _celery_result_backend_url function fails for Redis
URLs with query strings or trailing slashes because the current regex only
matches a numeric DB segment at the end; fix it by using urllib.parse to parse
the URL, inspect and modify the path component: use
urllib.parse.urlparse(redis_url), if path is empty or "/" set path = "/1",
otherwise split the path by "/" and if the last non-empty segment is a digit
replace it with "1" else append "1" as a new segment; then rebuild the URL with
urllib.parse.urlunparse to preserve scheme, netloc, params, query, and fragment
and return that string. Ensure this logic is implemented inside
_celery_result_backend_url and only runs when redis_url is truthy.
---
Nitpick comments:
In `@docker-compose.staging.yml`:
- Around line 70-71: The volume mount "- ./data/flower/:/data/" can lead to
permission issues if Flower runs as a non-root UID; either ensure the host
directory exists and is chowned to the same UID/GID Flower runs as (create the
directory and run chown to the container user), or replace the bind mount with a
named Docker volume (e.g., declare a named volume like "flower_data" and mount
it to /data) and add the named volume under the compose "volumes" section so
Docker manages ownership; alternatively set the "user" field on the Flower
service to match the host directory ownership if you must keep the bind mount.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c640d04-e86c-4687-826a-0a633a5f3539
📒 Files selected for processing (3)
compose/staging/redis.confconfig/settings/base.pydocker-compose.staging.yml
✅ Files skipped from review due to trivial changes (1)
- compose/staging/redis.conf
Add .envs/.production/.compose-example documenting required DATABASE_IP variable. Add compose/staging/deploy.sh as the canonical deploy script (fetch, build, migrate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The regex-based approach failed for URLs with query strings or trailing slashes. Use urlparse/urlunparse to properly handle the path component. Also clarifies the Redis DB numbering convention in comments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
compose/staging/deploy.sh (1)
5-6: Consider addingset -o pipefailfor robust error handling.With only
errexit, failures in piped commands (except the last) are silently ignored. Addingpipefailensures the script exits on any command failure within a pipeline.♻️ Proposed fix
set -o errexit +set -o pipefail set -o xtrace🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compose/staging/deploy.sh` around lines 5 - 6, Add robust pipeline failure handling by enabling pipefail in the shell setup: update the script's shell options where set -o errexit and set -o xtrace are configured to also include set -o pipefail so that any failing command in a pipeline causes the script to exit; modify the block containing "set -o errexit" and "set -o xtrace" to include "set -o pipefail".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compose/staging/deploy.sh`:
- Line 10: The script currently runs git fetch origin but never updates the
working tree, so update deploy.sh to actually pull or reset to the intended
branch after the fetch (e.g., run git pull --ff-only origin <branch> or git
reset --hard origin/<branch>); locate the git fetch origin line in deploy.sh and
replace or follow it with the chosen update command and ensure any local changes
are handled (stash/abort) so the Docker build uses the freshly deployed code.
---
Nitpick comments:
In `@compose/staging/deploy.sh`:
- Around line 5-6: Add robust pipeline failure handling by enabling pipefail in
the shell setup: update the script's shell options where set -o errexit and set
-o xtrace are configured to also include set -o pipefail so that any failing
command in a pipeline causes the script to exit; modify the block containing
"set -o errexit" and "set -o xtrace" to include "set -o pipefail".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9784f456-18c6-4b87-b89f-21ec651d0176
📒 Files selected for processing (2)
.envs/.production/.compose-examplecompose/staging/deploy.sh
✅ Files skipped from review due to trivial changes (1)
- .envs/.production/.compose-example
Document client_max_body_size 100M requirement for ML worker payloads, proxy_read_timeout for long API operations, and example nginx config for SSL termination. Also fix deploy.sh symlink resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
git fetch updates remote refs but does not update the working tree, so the Docker build was using stale code. Use git pull --ff-only to actually update the checked-out branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on convention Staging means single-box deployment (demo, preview, testing), not a pre-production environment. The .envs/.production/ directory is a cookiecutter-django convention for non-local-dev config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Update
docker-compose.staging.ymlto serve as the standard config for single-box deployments (demos, previews, testing). Local Redis, RabbitMQ, and NATS containers; external database only.Changes
Staging compose (
docker-compose.staging.yml)DATABASE_IPin.envs/.production/.composedepends_on(was commented out)restart: alwaysto all servicescontainer_nameon NATS (allows multiple instances on same host)awscliservice (backups handled externally)New files
compose/staging/docker-compose.db.yml— optional local PostgreSQL containercompose/staging/redis.conf— Redis config (disables bgsave, sets maxmemory/eviction)compose/staging/deploy.sh— deploy script with branch/host safety echo, usesgit pull --ff-onlycompose/staging/README.md— setup guide, env reference, multi-instance instructions, reverse proxy example.envs/.production/.compose-example— documents requiredDATABASE_IPvariableSettings (
config/settings/base.py)_celery_result_backend_urlderives DB 1 URL fromREDIS_URLusingurllib.parse. Environments withCELERY_RESULT_BACKENDexplicitly set (e.g. production) are unaffected.DATA_UPLOAD_MAX_MEMORY_SIZEto 100 MB for ML worker result payloadspkg_resourcesremoval in Python 3.12+)Production impact
CELERY_RESULT_BACKENDis not explicitly set in the env. Production sets it explicitly, so no change until you opt in.DATA_UPLOAD_MAX_MEMORY_SIZEincrease allows larger ML result payloads (already needed).docker-compose.production.ymlis not modified.Required env setup for staging/demo
.envs/.production/.djangomust include:NATS_URL=nats://nats:4222— without it, the app defaults to127.0.0.1:4222which doesn't resolve inside Docker containersCELERY_BROKER_URL=amqp://user:pass@rabbitmq:5672/— RabbitMQ credentialscompose/staging/README.mdfor the full variable referenceReverse proxy
The staging README includes an example nginx config. Key requirement:
client_max_body_size 100M— ML workers POST large result payloads that exceed the default 1M/10M limits.Usage
Environments tested
Relates to RolnickLab/ami-devops#1, RolnickLab/ami-admin#66
Closes #1180
Test plan
docker compose configvalidates without errors🤖 Generated with Claude Code