k8s: shared-cluster safety checks and deployment-id decoupling by prathamesh0 · Pull Request #748 · cerc-io/stack-orchestrator

prathamesh0 · 2026-04-21T06:34:53Z

Kind extraMount compatibility: fail fast at deployment start when a new deployment's mounts don't match the running cluster; warn when the first cluster is created without a kind-mount-root umbrella; replace the cryptic ConfigException with readable errors when the cluster is missing
Auto-ConfigMap for file-level host-path compose volumes (so-7fc): ../config/foo.sh:/opt/foo.sh-style binds become per-namespace ConfigMaps at deploy start instead of aliasing via the kind extraMount chain. deploy create rejects :rw, subdirs, and over-budget sources. Deployment-dir layout unchanged
Namespace ownership: stamp the namespace with laconic.com/deployment-dir on create; fail loudly if another deployment tries to land in the same namespace. Pre-existing namespaces adopt ownership on next start
deployment-id / cluster-id decoupling: split the two roles (kube context vs resource-name prefix) into separate deployment.yml fields. Backward-compat fallback keeps existing resource names stable
Close stale pebbles so-n1n and so-ad7

Kind applies extraMounts only at cluster creation. When a deployment joins an existing shared cluster, any extraMount its kind-config declares that isn't already active on the running control-plane is silently ignored — PVs backed by those mounts fall through to the node's overlay filesystem and lose data on cluster destroy. Validate this up front in create_cluster(): - On cluster reuse, compare the new deployment's extraMounts against the live bind mounts on the control-plane container (via docker inspect). Fail with a DeployerException listing every mismatched mount and pointing at docs/deployment_patterns.md. - On first-time cluster creation without a /mnt umbrella mount (kind-mount-root unset), print a warning that future stacks may require a full recreate to add new host-path mounts. Document the umbrella-mount convention (kind-mount-root) and the migration path for existing clusters in docs/deployment_patterns.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nagement --skip-cluster-management is the default, so `deployment start` without an existing cluster lands straight in connect_api() which raises a cryptic kubernetes.config.ConfigException about a missing kube context. Preflight in _setup_cluster() on the skip-cluster-management kind path: - If no kind cluster is running, raise DeployerException pointing at --perform-cluster-management. - If a different kind cluster is running, raise DeployerException showing both names and the two ways to reconcile (edit deployment.yml or recreate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…th too The mount-compatibility check lived inside create_cluster(), which only runs under --perform-cluster-management. Under the (default) --skip-cluster-management path the check was skipped — a deployment joining an existing cluster with an incompatible kind-config would proceed and silently fall through to the node's overlay FS, which is exactly the failure mode the check was designed to catch. Rename _check_mounts_compatible → check_mounts_compatible (now public) and call it from both paths in _setup_cluster(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The original error always prescribed cluster recreate. When the running cluster already has an umbrella at /mnt, that's misleading — the right fix is to align the new deployment to the existing umbrella (set kind-mount-root to the cluster's umbrella source and move host paths under it). Recreate is only correct when no umbrella exists. Branch the error message on whether the cluster has a /mnt bind. With umbrella: show its host source and tell the user to set kind-mount-root to that value. Without: keep the recreate guidance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t check Compose volumes like './config/x.sh' are emitted per-deployment with containerPath '/mnt/host-path-<sanitized>' and source paths scoped to each deployment's own directory. Two deployments of the same stack will always clash at those containerPaths regardless of kind-mount-root — this is a pre-existing SO aliasing behavior for file-level binds, orthogonal to umbrella compatibility. Let the mount-compatibility check skip '/mnt/host-path-*' entries so the positive case (shared umbrella across deployments) doesn't false- positive. The check still covers the /mnt umbrella itself and named- volume data mounts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

File-level host-path compose volumes (e.g. `../config/foo.sh:/opt/foo.sh`) were synthesized into a kind extraMount + hostPath PV chain with a sanitized containerPath (`/mnt/host-path-<sanitized>`). The sanitized name is derived from the compose volume source and is identical across deployments of the same stack, so two deployments sharing a cluster collided at the containerPath — kind only honors the first deployment's bind, subsequent deployments' pods silently read the first's content. The same code path was also broken on real k8s, which has no way to populate `/mnt/host-path-*` on worker nodes. File-level compose binds are conceptually k8s ConfigMaps. The snowball stack already uses the ConfigMap-backed named-volume pattern by hand. Make that automatic at the k8s object-generation layer, without touching deployment-dir compose or spec files. Behavior at deploy create (validation only, no file mutation): - :rw on a host-path bind -> DeployerException (use a named volume for writable data) - Directory with subdirectories -> DeployerException (embed in image, split into configmaps, or use initContainer) - Directory or file > ~700 KiB -> DeployerException (ConfigMap budget) - File, or flat small directory -> accepted, handled at deploy start Behavior at deploy start: - cluster_info.get_configmaps() additionally walks pod + job compose volumes and emits a V1ConfigMap per host-path bind (deduped by sanitized name across all pods/services). Content read from {deployment_dir}/config/<pod>/<file> (already populated by _copy_extra_config_dirs). - volumes_for_pod_files emits V1ConfigMapVolumeSource instead of V1HostPathVolumeSource for host-path binds. - volume_mounts_for_service stats the source and sets V1VolumeMount sub_path to the filename when source is a regular file — single-key ConfigMaps land as files, whole-dir ConfigMaps land as directories. - _generate_kind_mounts no longer emits `/mnt/host-path-*` extraMounts for these binds (the ConfigMap path bypasses kind node FS entirely). Deployment dir layout is unchanged. Compose files, spec.yml, and {deployment_dir}/config/<pod>/ remain exactly as today — trivially diffable against stack source, no synthetic volume names. ConfigMaps are visible only in k8s (kubectl get cm -n <ns>). The existing `/mnt/host-path-*` skip in check_mounts_compatible is retained as a transition tolerance for deployments created before this change. Updates: - deployment_create: _validate_host_path_mounts() called per pod/job in the create loops; 700 KiB ConfigMap budget (accounts for base64 + metadata overhead) - helpers: _generate_kind_mounts skips host-path entries; volumes_for_pod_files emits ConfigMap-backed V1Volume; volume_mounts_for_service takes optional deployment_dir and auto-sets sub_path for single-file sources - cluster_info: new _host_path_bind_configmaps() walked from get_configmaps(); volume_mounts_for_service call passes deployment_dir from spec.file_path - docs: document the behavior and the rejected shapes in deployment_patterns.md - tests: k8s-deploy asserts the host-path ConfigMaps exist, compose/spec unchanged, and no `/mnt/host-path-*` extraMounts Refs: so-b86 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olumes Implementation on this branch at commit cb84388. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nt override Two deployments whose stack_name derives the same namespace (e.g. two deployments of the test stack, or any spec without an explicit `namespace:` override) silently patch each other's Deployment, ConfigMaps, Services, and PVCs when they share a cluster — last `deployment start` wins. No error today; operator sees only "Updated Deployment ... (rolling update)" and can't tell what happened. Stamp the namespace with a `laconic.com/deployment-dir` annotation on first creation. On subsequent `deployment start`: - Annotation missing (legacy / user-created namespace): adopt by stamping, so the NEXT conflicting deployment fails loudly. - Annotation matches this deployment's dir: proceed. - Annotation points to a different deployment dir: raise DeployerException with both dirs and the exact `namespace:` spec override to fix it. Low migration risk: the woodburn pattern (multiple stacks, each with its own stack_name-derived namespace) continues to work — those namespaces don't collide by construction. Only same-stack+same-cluster deployments are affected, which never worked correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tell the user the valid namespace-name shape so they don't pick a suffix that k8s will reject (uppercase, underscores, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cluster-id plays two roles today: (a) which kind cluster this deployment attaches to (used for the kube-config context name) and (b) compose_project_name -> app_name, the prefix for every k8s resource the deployment creates. _get_existing_kind_cluster() in deploy create forces (a) to inherit the running cluster's name, and because (a) and (b) are the same field, (b) inherits too — so two deployments that share a cluster also share an app_name and collide on every resource whose suffix isn't naturally distinct (PVs are cluster-scoped; same-stack deployments collide there in particular). Decouple: add a distinct `deployment-id` field. cluster-id keeps its current behavior (inherit running cluster, else fresh). deployment-id is always fresh per `deploy create`. K8sDeployer sources kind_cluster_name from cluster-id and app_name from deployment-id. Backward compatibility: - Existing deployment.yml files have only cluster-id; no on-disk change until the next `deploy create`. - DeploymentContext.init() falls back: deployment-id = cluster-id when the field is absent. Existing deployments keep their current app_name and resource names on next start — no PV renames, no re-binds, no data orphaning. - `compose_project_name` parameter to K8sDeployer is retained (still used by the compose deployer path); only the k8s-side internals switch to deployment_context getters. - The helm chart generator continues to derive chart names from cluster-id; untouched here, worth a follow-up for consistency. Effect on woodburn: dumpster/rpc/trashscan each already carry a distinct cluster-id in their deployment.yml (pre-`_get_existing_kind_cluster` era). Under the fallback, they all adopt their existing cluster-id as deployment-id, so resource names are identical to today. Effect on new deployments: even when they share a running cluster (kind-cluster-name in kube-config matches cluster-id), they get distinct deployment-ids at deploy create, and thus distinct resource name prefixes. The same-stack PV collision the namespace ownership check surfaces goes away by construction. Test: run-deploy-test.sh now reads deployment-id from the new field, falling back to cluster-id for pre-decouple fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Describe the two identifiers, their different roles (kube context attachment vs k8s resource name prefix), the collision-avoidance rationale, backward-compat fallback for pre-decouple deployment.yml files, and the namespace ownership annotation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same test-fixture fix as tests/k8s-deploy/run-deploy-test.sh: after the cluster-id/deployment-id decouple, deployment.yml has two lines instead of one, so \`cut -d ' ' -f 2\` yields a multi-value string. The assignment to \$deployment_id then corrupted the kubectl label selector — kubectl saw "a name AND a selector" and refused. Split the extraction by field, with deployment-id falling back to cluster-id when the field is absent (pre-decouple deployment.yml). Use cluster-id for the kind worker node name; deployment-id for the app= label selector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prathamesh0 and others added 14 commits April 20, 2026 09:30

chore(pebbles): close so-7fc — auto-ConfigMap for host-path compose v…

44a3ed6

…olumes Implementation on this branch at commit cb84388. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(k8s): note namespace naming constraints in the ownership error

b9148c8

Tell the user the valid namespace-name shape so they don't pick a suffix that k8s will reject (uppercase, underscores, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(pebbles): close so-n1n — propagation fix already on main

68b4bda

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(pebbles): close so-ad7 — fixed in #744

2fed1b3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prathamesh0 merged commit 421b83c into main Apr 21, 2026
6 checks passed

prathamesh0 deleted the feat/so-b86-auto-configmap-host-path branch April 21, 2026 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s: shared-cluster safety checks and deployment-id decoupling#748

k8s: shared-cluster safety checks and deployment-id decoupling#748
prathamesh0 merged 14 commits intomainfrom
feat/so-b86-auto-configmap-host-path

prathamesh0 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prathamesh0 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant