Chart Version: 0.1.0
App Version: 1.0.11
Created: April 2026
Purpose: Production-ready Helm chart for deploying Solace Cache instances (Linux C API application)
- These cache instances are stateful "pets" that must maintain maximum uptime
- Each pod has a unique identity and specific configuration
- Focus on HA protection and graceful handling of disruptions
- Use StatefulSet for stable pod identities and ordered deployment
- StatefulSet - Main workload (not Deployment)
- Headless Service - DNS resolution for pods (no inbound ports needed)
- ConfigMap - Template config with placeholders for per-pod substitution
- Secret - Broker credentials (username/password)
- PodDisruptionBudget - Prevents voluntary disruptions from affecting availability
- Init Container - Generates unique config per pod before main container starts
Problem: Each cache instance needs a unique CACHE_INSTANCE_NAME that matches the broker's distributed cache configuration.
Solution: A busybox init container extracts the pod ordinal and selects from the
instanceNames list (passed in by Helm as a space-separated string), then substitutes
placeholders in the config template. Credentials are read per-ordinal from a mounted
secret. No yq dependency — plain POSIX shell:
# In init container (see templates/statefulset.yaml)
POD_ORDINAL=$(echo $HOSTNAME | grep -o '[0-9]*$')
INSTANCE_NAMES="{{ join " " .Values.solaceCache.instanceNames }}"
INSTANCE_NAME=$(echo $INSTANCE_NAMES | cut -d' ' -f$((POD_ORDINAL + 1)))Substitution is done with awk, not sed: the values (instance name plus
__SESSION_USERNAME__ / __SESSION_PASSWORD__ read per-ordinal from the mounted secret)
are exported into the environment and replaced with a literal substring subst() function.
This avoids sed mangling credentials that contain its delimiter, the & replacement
metachar, or a backslash:
# values passed via ENVIRON, replaced literally (no regex/delimiter interpretation)
$0 = subst($0, "__CACHE_INSTANCE_NAME__", ENVIRON["INSTANCE_NAME"])
$0 = subst($0, "__SESSION_USERNAME__", ENVIRON["USERNAME"])
$0 = subst($0, "__SESSION_PASSWORD__", ENVIRON["PASSWORD"])Problem: Container was taking 30 seconds to shut down (SIGTERM timeout).
Solution: The container runs a wrapper script (templates/wrapper-script-configmap.yaml,
mounted at /scripts/cache-wrapper.sh) that launches SolaceCache in the background and
installs a trap on TERM/INT to forward SIGTERM to the cache process and wait for it
to exit cleanly:
trap "kill -TERM ${CACHE_PID}; wait ${CACHE_PID}; ...; exit 0" TERM INTThe wrapper also tails the cache logs to drive the readiness probe (see below). Result: Shutdown time reduced from 30s to ~2s.
Note: an earlier approach used
execso SolaceCache became PID 1 directly. That was replaced by the wrapper once we needed log-watching for readiness; the wrapper now owns PID 1 and is responsible for signal forwarding.
Problem: Kubernetes doesn't restart pods when a referenced ConfigMap changes.
Solution: Add a checksum annotation to the pod template:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}Result: A config change advances the StatefulSet revision. Under
RollingUpdate this auto-restarts pods; under the default OnDelete it marks
pods as "needs update" so the operator can apply it on their own schedule (see
Gotcha #5). Either way, pods only pick up new config when they restart.
Liveness uses a process check. The process name is SolaceCache with capital S and C:
livenessProbe:
exec:
command: ["pgrep", "-f", "SolaceCache"]NOT: "solcache" or "solaceCache" - must match exactly.
Readiness is driven by the wrapper script, which watches the cache logs and toggles two marker files. The probe is ready only when both exist:
readinessProbe:
exec:
command: ["sh", "-c", "[ -f /tmp/cache-state-up ] && [ -f /tmp/lost-msg-clear ]"]/tmp/cache-state-up— created onState changed to: UP, removed on any other state change./tmp/lost-msg-clear— created onLOST_MSG_STATE_CLEAR, removed onLOST_MSG_STATE_SET.
This requires INFO-level CACHE_LOG_LEVEL (the wrapper greps the log lines), which is why
debugCacheLogLevel: false yields INFO rather than a quieter level.
-
values.yaml - Base configuration with sensible defaults
- 2 replicas (HA by default)
- INFO log levels
- PodDisruptionBudget enabled
- Preferred pod anti-affinity
-
values-standalone.yaml - Single instance for dev/UAT
- 1 replica but with PDB protection
- DEBUG log levels
- Higher probe failure thresholds
-
values-prod-ha.yaml - Production overrides only
- Custom registry and image pull secrets
- Higher resource limits (4Gi/4000m)
- SSL enabled for broker connection
- Required (strict) pod anti-affinity
- statefulset.yaml - Main workload with init container and wrapper-script container
- service.yaml - Headless service (clusterIP: None)
- configmap.yaml - Config template with placeholders
- wrapper-script-configmap.yaml -
cache-wrapper.sh: launches SolaceCache, forwards signals, drives readiness markers - secret.yaml - Broker credentials, keyed
username-<ordinal>/password-<ordinal> - poddisruptionbudget.yaml - HA protection (minAvailable: 1)
- serviceaccount.yaml - Optional RBAC
(No Helm test is shipped: the liveness probe already verifies the SolaceCache
process continuously, which is what a one-shot test would have checked. The
cache image is Ubuntu + the binary only - no kubectl - so an in-cluster
exec-based test would also need a kubectl image plus exec RBAC.)
- scripts/kubectl-backup-cache - SEMP-driven cache backup plugin
- scripts/kubectl-restore-cache - SEMP-driven cache restore plugin
- scripts/copy-cache-contents.sh - simpler standalone backup helper
- README.md - Complete usage guide
- QUICKSTART.md - Fast deployment instructions
- NOTES.txt - Post-install instructions displayed to user
- PROJECT_CONTEXT.md - This file
solaceCache:
instanceNames:
- "cache-instance-0" # Must match broker config
- "cache-instance-1"
distributedCacheName: "my-distributed-cache" broker:
host: "tcp://solace-broker:55555"
vpn: "default"
usernames: # one per replica; single entry reused for all
- "cache-user"
passwords: # one per replica; or use existingSecret
- "cache-password"
existingSecret: "" # Recommended for production (keys username-N/password-N) settings:
sdkLogLevel: "NOTICE" # Solace API/SDK logging
debugCacheLogLevel: false # false => CACHE_LOG_LEVEL INFO (required by readiness wrapper); true => DEBUGpodDisruptionBudget:
enabled: true
minAvailable: 1 # At least 1 pod must remain during disruptionshelm install my-cache . -f values-standalone.yaml- 1 replica with PDB enabled (maximum uptime)
- DEBUG logging for troubleshooting
- Suitable for non-production environments
helm install prod-cache . -f values-prod-ha.yaml- 2 replicas on separate nodes (required anti-affinity)
- SSL-enabled broker connection
- Higher resource allocations
- Secrets-based credentials
helm upgrade my-cache . -f my-values.yaml- Default strategy is
OnDelete: this stages the change but does NOT restart pods. Apply it manually, one ordinal at a time:kubectl delete pod <name>-1, verify, then<name>-0. - With
updateStrategy.type: RollingUpdate, the upgrade auto-restarts pods in reverse ordinal order (1, then 0); PDB keeps ≥1 pod available during the roll.
helm lint .
helm template test . --debug# Check pod status
kubectl get pods -l app.kubernetes.io/name=solace-cache
# View logs
kubectl logs -f solace-cache-0
kubectl logs -f solace-cache-1
# Check config generated correctly
kubectl exec solace-cache-0 -- cat /home/solace/config/config.txtkubectl exec solace-cache-0 -- pgrep -f SolaceCacheShould return PID. If empty, container is not running correctly.
- Health probes use
pgrep -f SolaceCache - Must match capital S and C
- Tests use same pattern
- The init container is
busyboxand parses with POSIX shell (grep/cut/awk) — noyq - It extracts the pod ordinal from
$HOSTNAMEand picks the matching instance name - The
instanceNameslist is injected by Helm at render time as a space-separated string
- PDB with minAvailable: 1 on single replica prevents all voluntary disruptions
- This is intentional for "pets" philosophy
- Node drains will be blocked until PDB is deleted or pod is force-evicted
- Default
values.yamlhas placeholder:your-registry/solace-cache - Must be updated before deployment
- Production values already override this
- Configurable via
updateStrategyin values; defaults toOnDelete OnDelete:helm upgradestages changes but does NOT restart pods — the operator applies them by deleting pods (kubectl delete pod <name>), one ordinal at a time, at a chosen time. Chosen as the default because cache instances are "pets" whose restart timing should be operator-controlled.RollingUpdate: Kubernetes auto-restarts pods (highest ordinal first) when the pod template changes; supports staged rollouts viarollingUpdate.partition.values-standalone.yamluses this for dev convenience.- Note: each pod generates its config once at startup (init container → emptyDir),
so a changed ConfigMap has no effect on a running pod until it restarts under
either strategy. The
checksum/configpod annotation advances the StatefulSet revision on config change, which (a) auto-triggers restart under RollingUpdate and (b) marks pods as "needs update" (a staged-change signal) under OnDelete.
- Monitoring - Add Prometheus metrics if SolaceCache exposes them
- Backup Strategy - Document or automate cache state backup
- Multi-Region - Add topology spread constraints for zone awareness
- Readiness Gates - External validation before marking pod ready
- Config Validation - Pre-flight checks in init container
- Syslog Integration - Enable syslog forwarding for centralized logging
- HPA - Autoscaling disabled; replicas should be manually controlled
- Ingress - No inbound traffic; cache connects to broker only
- Persistence - Uses emptyDir; cache state is ephemeral per pod lifecycle
kubectl describe pod solace-cache-0
kubectl logs solace-cache-0 -c init-config # Check init container- Verify instanceNames array has enough entries for replica count
- Check secret exists if using existingSecret
- Ensure image is pullable from registry
- Verify exec wrapper is present in StatefulSet command
- Check if process is running as PID 1:
kubectl exec pod -- ps aux
- Check if checksum annotation is in statefulset.yaml pod template
- Verify ConfigMap was actually changed
- Force restart:
kubectl rollout restart statefulset/solace-cache
- Verify process name:
kubectl exec pod -- pgrep -f SolaceCache - Check startup time; may need longer initialDelaySeconds
- Review logs for crash loops
kubectl get pdb
kubectl describe pdb solace-cache- Expected behavior for "pets" with single replica
- Temporarily disable:
kubectl delete pdb solace-cache - Re-enable after drain:
helm upgrade --reuse-values
cd /path/to/solace-cache
helm package .Produces: solace-cache-0.1.0.tgz
helm install my-cache solace-cache-0.1.0.tgz -f my-values.yamlUpdate Chart.yaml version field, then repackage:
version: 0.2.0 # Chart version
appVersion: 1.0.12 # Application version- GitHub: Posted by user (April 2026)
- Chart Type: Application
- License: Not specified
- Maintainer: Add to Chart.yaml if needed
This Helm chart implements a production-ready deployment for Solace Cache with:
- ✅ Unique per-pod configuration via init container
- ✅ Fast graceful shutdown via signal-forwarding wrapper script
- ✅ Operator-controlled (manual) restarts by default; RollingUpdate optional
- ✅ HA protection with PodDisruptionBudget
- ✅ Node distribution via pod anti-affinity
- ✅ Multiple deployment profiles (dev/prod)
- ✅ Comprehensive documentation and testing
Philosophy: Treat cache instances as stateful "pets" requiring maximum uptime and careful handling during disruptions. The chart prioritizes availability and correctness over scalability.