You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
k8s: shared-cluster safety checks and deployment-id decoupling (#748)
- **Kind extraMount compatibility**: fail fast at `deployment start` when a new deployment's mounts don't match the running cluster; warn when the first cluster is created without a `kind-mount-root` umbrella; replace the cryptic `ConfigException` with readable errors when the cluster is missing
- **Auto-ConfigMap for file-level host-path compose volumes** (so-7fc): `../config/foo.sh:/opt/foo.sh`-style binds become per-namespace ConfigMaps at deploy start instead of aliasing via the kind extraMount chain. `deploy create` rejects `:rw`, subdirs, and over-budget sources. Deployment-dir layout unchanged
- **Namespace ownership**: stamp the namespace with `laconic.com/deployment-dir` on create; fail loudly if another deployment tries to land in the same namespace. Pre-existing namespaces adopt ownership on next start
- **deployment-id / cluster-id decoupling**: split the two roles (kube context vs resource-name prefix) into separate `deployment.yml` fields. Backward-compat fallback keeps existing resource names stable
- Close stale pebbles `so-n1n` and `so-ad7`
Copy file name to clipboardExpand all lines: .pebbles/events.jsonl
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -46,3 +46,9 @@
46
46
{"type":"comment","timestamp":"2026-04-17T08:13:32.753112339Z","issue_id":"so-o2o","payload":{"body":"Tested the version-detection fix (commit 832ab66d) locally. Fix works for its scope but surfaces two more bugs downstream. Current approach is broken at the architectural level, not just one-bug-fixable.\n\nWhat 832ab66d does: captures etcd image ref from crictl after cluster create, writes to {backup_dir}/etcd-image.txt, reads it on subsequent cleanup runs. Self-adapts to Kind upgrades. No more hardcoded v3.5.9. Confirmed locally: etcd-image.txt is written after first create, cleanup on second start uses it, member.backup-YYYYMMDD-HHMMSS dir is produced (proves cleanup ran end-to-end).\n\nWhat still fails after version fix: kubeadm init on cluster recreate. apiserver comes up but returns:\n- 403 Forbidden: User \"kubernetes-admin\" cannot get path /livez\n- 500: Body was not decodable ... json: cannot unmarshal array into Go value of type struct\n- eventually times out waiting for apiserver /livez\n\nTwo new bugs behind those:\n\n(a) Restore step corrupts binary values. In _clean_etcd_keeping_certs the restore loop is:\n key=$(echo $encoded | base64 -d | jq -r .key | base64 -d)\n val=$(echo $encoded | base64 -d | jq -r .value | base64 -d)\n echo \"$val\" | /backup/etcdctl put \"$key\"\nk8s stores objects as protobuf. Piping raw protobuf through bash variable expansion + echo mangles non-printable bytes, truncates at null bytes, and appends a trailing newline. Explains the \"cannot unmarshal\" from apiserver — the kubernetes Service/Endpoints objects in /registry are corrupted on re-put.\n\n(b) Whitelist is too narrow. We keep only /registry/secrets/caddy-system and the /registry/services entries for kubernetes. Everything else is deleted — including /registry/clusterrolebindings (cluster-admin is gone), /registry/serviceaccounts, /registry/secrets/kube-system (bootstrap tokens), RBAC roles, apiserver's auth config. Explains the 403 for kubernetes-admin — cluster-admin binding doesn't exist yet and kubeadm's pre-addon health check can't authorize.\n\nFixing (a) would mean rewriting the restore step to not use shell piping — either use a proper etcdctl-based Go tool, or write directly to the on-disk snapshot format. Fixing (b) means exhaustively whitelisting everything kubeadm/apiserver bootstrapping needs — a moving target across k8s versions. Both together are a significant undertaking for the actual requirement (\"keep 4 Caddy secrets across cluster recreate\").\n\nDecision: merge 832ab66d for the narrow version-detection fix + diagnosis trail, then implement the kubectl-level backup/restore on a separate branch. The etcd approach is not salvageable at reasonable cost."}}
47
47
{"type":"comment","timestamp":"2026-04-17T11:04:26.542659482Z","issue_id":"so-o2o","payload":{"body":"Shipped in PR #746. Etcd-persistence approach replaced with a kubectl-level Caddy Secret backup/restore gated on kind-mount-root.\n\nSummary of what landed:\n- components/ingress/caddy-cert-backup.yaml: SA/Role/RoleBinding + CronJob (alpine/kubectl:1.35.3) firing every 5min, writes {kind-mount-root}/caddy-cert-backup/caddy-secrets.yaml via atomic tmp+rename.\n- install_ingress_for_kind splits into 3 phases: pre-Deployment manifests → _restore_caddy_certs (kubectl apply from backup file) → Caddy Deployment → _install_caddy_cert_backup. Caddy pod can't exist until phase 3, so certs are always in place before secret_store startup.\n- Deleted _clean_etcd_keeping_certs, _get_etcd_host_path_from_kind_config, _capture_etcd_image, _read_etcd_image_ref, _etcd_image_ref_path and the etcd+PKI block in _generate_kind_mounts.\n- No new spec keys.\n\nTest coverage in tests/k8s-deploy/run-deploy-test.sh: install assertion after first --perform-cluster-management start, plus full E2E (seed fake manager=caddy Secret → trigger CronJob → verify backup file → stop/start --perform-cluster-management for cluster recreate → assert secret restored with matching decoded value).\n\nWoodburn migration: one-shot host-kubectl export to seed {kind-mount-root}/caddy-cert-backup/caddy-secrets.yaml was done manually on the running cluster (the in-cluster CronJob couldn't reach the host because the /srv/kind → /mnt extraMount was staged in kind-config.yml but never applied to the running cluster — it was added after cluster creation). File is in place for the eventual cluster recreate."}}
{"type":"create","timestamp":"2026-04-20T13:14:26.312724048Z","issue_id":"so-7fc","payload":{"description":"## Problem\n\nFile-level host-path compose volumes (e.g. `../config/foo.sh:/opt/foo.sh`) were synthesized into a kind extraMount + k8s hostPath PV chain with a sanitized containerPath (`/mnt/host-path-\u003csanitized\u003e`).\n\n- On kind: two deployments of the same stack sharing a cluster collide at that containerPath — kind only honors the first deployment's bind, so subsequent deployments' pods silently read the first's file. No error, no warning.\n- On real k8s: the same code emits `hostPath: /mnt/host-path-*` but nothing populates that path on worker nodes — effectively broken.\n\nFile-level host-path binds are conceptually k8s ConfigMaps. The `snowballtools-base-backend` stack already uses the ConfigMap-backed named-volume pattern manually; this issue is to make that automatic for all stacks.\n\n## Resolution\n\nImplemented on branch `feat/so-b86-auto-configmap-host-path` (commit `cb84388d`), stacked on top of `feat/kind-mount-invariant-check`.\n\n**No deployment-dir file rewriting.** Compose files, spec.yml, and `{deployment_dir}/config/\u003cpod\u003e/` are untouched — trivially diffable against stack source, no synthetic volume names. ConfigMaps are materialized at deploy start and visible only in k8s (`kubectl get cm -n \u003cns\u003e`).\n\n### Deploy create — validation only\n\n| Source shape | Behavior |\n|---|---|\n| Single file | Accepted |\n| Flat directory, no subdirs, ≤ ~700 KiB | Accepted |\n| Directory with subdirs | `DeployerException` — guidance: embed in image / split configmaps / initContainer |\n| File or directory \u003e ~700 KiB | `DeployerException` — ConfigMap budget (accounts for base64 + metadata) |\n| `:rw` on any host-path bind | `DeployerException` — use a named volume for writable data |\n\n### Deploy start — k8s object generation\n\n- `cluster_info.get_configmaps()` walks pod + job compose volumes and emits a `V1ConfigMap` per host-path bind (deduped by sanitized name), content read from `{deployment_dir}/config/\u003cpod\u003e/\u003cfile\u003e`.\n- `volumes_for_pod_files` emits `V1ConfigMapVolumeSource` instead of `V1HostPathVolumeSource` for host-path binds.\n- `volume_mounts_for_service` stats the source and sets `V1VolumeMount.sub_path` to the filename when source is a regular file.\n- `_generate_kind_mounts` no longer emits `/mnt/host-path-*` extraMounts — ConfigMap path bypasses the kind node FS entirely.\n\n### Transition\n\nThe `/mnt/host-path-*` skip in `check_mounts_compatible` is retained as a transition tolerance for deployments created before this change. Test coverage in `tests/k8s-deploy/run-deploy-test.sh` asserts host-path ConfigMaps exist in the namespace, compose/spec in deployment dir unchanged, and no `/mnt/host-path-*` entries in kind-config.yml.","priority":"2","title":"File-level host-path compose volumes alias across deployments sharing a kind cluster","type":"bug"}}
{"type":"comment","timestamp":"2026-04-21T05:57:12.476299839Z","issue_id":"so-n1n","payload":{"body":"Already merged: 929bdab8 is an ancestor of origin/main; all four extraMount emit sites in helpers.py carry `propagation: HostToContainer` (umbrella, per-volume named, per-volume host-path, high-memlock spec)."}}
{"type":"comment","timestamp":"2026-04-21T06:08:13.933886638Z","issue_id":"so-ad7","payload":{"body":"Fixed in PR #744 (cf8b7533). get_services() now includes the maintenance pod in the container-ports map so its per-pod Service is built and available for the Ingress swap."}}
@@ -200,3 +240,100 @@ Empty-path volumes appear persistent because they survive pod restarts (data liv
200
240
in Kind Node container). However, this data is lost when the kind cluster is
201
241
recreated. This "false persistence" has caused data loss when operators assumed
202
242
their data was safe.
243
+
244
+
### Shared Clusters: Use `kind-mount-root`
245
+
246
+
Because kind `extraMounts` can only be set at cluster creation, the first
247
+
deployment to start locks in the mount topology. Later deployments that
248
+
declare new `extraMounts` have them silently ignored — their PVs fall
249
+
through to the kind node's overlay filesystem and lose data on cluster
250
+
destroy.
251
+
252
+
The fix is an umbrella mount. Set `kind-mount-root` in the spec, pointing
253
+
at a host directory all stacks will share:
254
+
255
+
```yaml
256
+
# spec.yml
257
+
kind-mount-root: /srv/kind
258
+
259
+
volumes:
260
+
my-data: /srv/kind/my-stack/data # visible at /mnt/my-stack/data in-node
261
+
```
262
+
263
+
SO emits a single `extraMount` (`<kind-mount-root>` → `/mnt`). Any new
264
+
host subdirectory under the root is visible in the node immediately — no
265
+
cluster recreate needed to add stacks.
266
+
267
+
**All stacks sharing a cluster must agree on `kind-mount-root`** and keep
268
+
their host paths under it.
269
+
270
+
### Mount Compatibility Enforcement
271
+
272
+
`laconic-so deployment start` validates mount topology:
273
+
274
+
- **On first cluster creation** without an umbrella mount: prints a
275
+
warning (future stacks may require a full recreate to add mounts).
276
+
- **On cluster reuse**: compares the new deployment's `extraMounts`
277
+
against the live mounts on the control-plane container. Any mismatch
278
+
(wrong host path, or mount missing) fails the deploy.
279
+
280
+
### Static files in compose volumes → auto-ConfigMap
281
+
282
+
Compose volumes that bind a host file or flat directory into a container
283
+
(e.g. `../config/test/script.sh:/opt/run.sh`) are used to inject static
284
+
content that ships with the stack. k8s doesn't have a native notion of
285
+
this — the canonical way to inject static content is a ConfigMap.
286
+
287
+
At `deploy start`, laconic-so auto-generates a namespace-scoped
288
+
ConfigMap per host-path compose volume (deduped by source) and mounts
289
+
it into the pod instead of routing the bind through the kind node:
290
+
291
+
| Source shape | Behavior |
292
+
|---|---|
293
+
| Single file | ConfigMap with one key (the filename); pod mount uses `subPath` so the single key lands at the compose target path |
294
+
| Flat directory (no subdirs, ≤ ~700 KiB) | ConfigMap with one key per file; pod mount exposes all keys at the target path |
295
+
| Directory with subdirs, or over budget | Rejected at `deploy create` — embed in the container image, split into multiple ConfigMaps, or use an initContainer |
296
+
| `:rw` on any host-path bind | Rejected at `deploy create` — use a named volume with a spec-configured host path for writable data |
297
+
298
+
The deployment dir layout is unchanged: compose files stay verbatim and
299
+
`spec.yml`is not rewritten. Source files remain under
300
+
`{deployment_dir}/config/{pod}/`(as copied by `deploy create`); the
301
+
ConfigMap is built from them at deploy start and no kind extraMount is
302
+
emitted for these paths.
303
+
304
+
This works identically on kind and real k8s (ConfigMaps are
305
+
cluster-native; no node-side landing pad required), and two deployments
306
+
of the same stack sharing a cluster get their own per-namespace
307
+
ConfigMaps — no aliasing.
308
+
309
+
### Writable / generated data → named volume + host path
310
+
311
+
For volumes the workload *writes to* (databases, ledgers, caches, logs),
312
+
use a named volume backed by a spec-configured host path under
313
+
`kind-mount-root`:
314
+
315
+
```yaml
316
+
# compose
317
+
volumes:
318
+
- my-data:/var/lib/foo
319
+
320
+
# spec.yml
321
+
kind-mount-root: /srv/kind
322
+
volumes:
323
+
my-data: /srv/kind/my-stack/data
324
+
```
325
+
326
+
Works on both kind (via the umbrella mount) and real k8s (operator
327
+
provisions `/srv/kind/my-stack/data` on each node).
328
+
329
+
### Migrating an Existing Cluster
330
+
331
+
If a cluster was created without an umbrella mount and you need to add a
332
+
stack that requires new host-path mounts, the cluster must be recreated:
333
+
334
+
1. Back up ephemeral state (DBs, caches) from PVs that lack host mounts —
335
+
these are in the kind node overlay FS and do not survive `kind delete`.
336
+
2. Update every stack's spec to set a shared `kind-mount-root` and place
337
+
host paths under it.
338
+
3. Stop all deployments, destroy the cluster, recreate it by starting any
0 commit comments