Skip to content

feat(substrate): bump agent substrate to 0.0.8#2140

Open
jjamroga wants to merge 1 commit into
kagent-dev:mainfrom
jjamroga:jjamroga/bump-substrate-0.0.8
Open

feat(substrate): bump agent substrate to 0.0.8#2140
jjamroga wants to merge 1 commit into
kagent-dev:mainfrom
jjamroga:jjamroga/bump-substrate-0.0.8

Conversation

@jjamroga

@jjamroga jjamroga commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Bumps kagent's substrate dependency to substrate v0.0.8 and updates every actor call site for the new atespace-scoped identity model.

Code changes

  • ActorRef refactor — substrate v0.0.8 replaces the flat ActorId string field on Get/Create/Resume/Suspend/DeleteActorRequest (and the new PauseActorRequest) with ActorRef{Atespace, Name}. Every method on substrate.Client grew an atespace parameter, and every caller was updated to pass it through. Introduces an EnsureAtespace helper that idempotently creates an atespace before the first actor is created in it (substrate returns FailedPrecondition: Atespace X not found otherwise).
  • Atespace = k8s namespace (1:1 mapping) — for actors kagent creates, atespace is the SandboxAgent/AgentHarness's Kubernetes namespace. Golden actors continue to live in substrate's reserved ate-golden atespace. sandboxbackend.Handle gained an Atespace field so cross-reconcile lookups (GetStatus, DeleteAgentHarness) can address the actor.
  • DNS/Host header shape — atenet-router now expects <name>.<atespace>.actors.resources.substrate.ate.dev. ActorHost and GatewayRouterTarget signatures updated accordingly, propagated through the ACP proxy handler and the A2A round-tripper.
  • SnapshotsConfig defaults — bug fix surfaced by PR feat(substrate): bump agent substrate to 0.0.7 #2109 making ActorTemplate.spec immutable. Kagent's desired spec left OnPause/OnCommit as zero-value empty strings while the API server defaulted them to Full on admission, so apiequality.Semantic.DeepEqual(existing, desired) reported drift on every reconcile, causing an infinite delete/recreate loop on the ActorTemplate CR. Kagent now sets both fields to SnapshotScopeFull explicitly in agent_lifecycle.go (SandboxAgent) and lifecycle_actortemplate.go (AgentHarness).
  • HTTP APISubstrateActorEntry.Atespace added to the /api/substrate/status response so the UI can surface the new field.

Helm / build ergonomics

  • SUBSTRATE_REPO overrideMakefile + helm/kagent/Chart-template.yaml + helm/kagent-crds/Chart-template.yaml now template the substrate subchart repository, so local dev can point at a self-published chart without hand-editing.
  • scripts/kind/setup-kind.sh — env-driven REG_NAME / REG_PORT / REG_SCHEME / REG_INTERNAL_PORT so an already-running local registry (HTTPS, non-standard port, etc.) can be reused instead of unconditionally creating kind-registry.
  • scripts/controller-digest-ldflags.shDIGEST_CURL_INSECURE=true env var to skip TLS verification when the digest resolver hits a self-signed local registry.

Verified locally

Full E2E on colima+kind+substrate v0.0.8 published charts:

  • SandboxAgent (declarative, runtime: go) reaches Ready=True, chat round-trip through the config-hashed session-actor path works.
  • AgentHarness (backend: openclaw) reaches Ready=True, chat via the shared-actor ACP path works.

Local E2E validation (substrate v0.0.8)

Verified against substrate v0.0.8's published OCI chart — no local substrate build required.

Prerequisites

  • Colima with --vm-type vz on Apple Silicon. Docker Desktop's linuxkit kernel breaks gVisor checkpoint/restore on arm64.
    brew install colima docker
    colima start --cpu 8 --memory 16 --disk 60 --vm-type vz
  • Docker Desktop OFF. It silently squats 127.0.0.1:5001 with its own registry the moment it launches, and every docker push localhost:5001/... from your Mac hits Docker Desktop's stale registry instead of colima's kind-registry.
    osascript -e 'quit app "Docker"'
    lsof -iTCP:5001 -sTCP:LISTEN   # must be empty
  • OPENAI_API_KEY exported in your shell.
  • This branch checked out.

1. Kind cluster + local registry

cd ~/Repositories/kagent
bash scripts/kind/setup-kind.sh

Creates a kagent kind cluster, a kind-registry container (host port 5001 → container port 5000), and installs a containerd hosts.toml on the node aliasing localhost:5001http://kind-registry:5000 so kubelet can pull our locally-built images.

2. Buildx builder on the kind network

Buildx's default --driver-opt network=host can't reach kind-registry by name. Recreate the builder on the kind docker network with a buildkitd config that marks the local registries HTTP so pushes don't try HTTPS:

docker buildx rm kagent-builder-v0.23.0 2>/dev/null || true
cat > /tmp/buildkitd.toml << 'BLDKD'
[registry."kind-registry:5000"]
  http = true
  insecure = true
[registry."localhost:5001"]
  http = true
  insecure = true
BLDKD
docker buildx create --name kagent-builder-v0.23.0 \
  --platform linux/amd64,linux/arm64 \
  --driver docker-container --driver-opt network=kind \
  --config /tmp/buildkitd.toml --use

3. Build + push kagent images

DOCKER_REGISTRY=kind-registry:5000 make build

This will fail at the build-controller step because controller-digest-ldflags.sh runs on the host, where kind-registry doesn't resolve. Everything up to and including the sandbox images is pushed successfully. Re-run just the controller step against the host-reachable localhost:5001 alias (same registry, content-addressed digests are identical):

DIGEST_LDFLAGS=$(CONTAINER_RUNTIME=docker \
  APP_IMG=localhost:5001/kagent-dev/kagent/app:v0.10.0-beta1-14-g841e8077 \
  APP_FULL_IMG=localhost:5001/kagent-dev/kagent/app:v0.10.0-beta1-14-g841e8077-full \
  GOLANG_ADK_IMG=localhost:5001/kagent-dev/kagent/golang-adk:v0.10.0-beta1-14-g841e8077 \
  GOLANG_ADK_FULL_IMG=localhost:5001/kagent-dev/kagent/golang-adk:v0.10.0-beta1-14-g841e8077-full \
  ACP_SANDBOX_OPENCLAW_IMG=localhost:5001/kagent-dev/kagent/acp-sandbox-openclaw:v0.10.0-beta1-14-g841e8077 \
  ACP_SANDBOX_HERMES_IMG=localhost:5001/kagent-dev/kagent/acp-sandbox-hermes:v0.10.0-beta1-14-g841e8077 \
  bash scripts/controller-digest-ldflags.sh)

docker buildx build --push --platform linux/arm64 \
  --build-arg VERSION=v0.10.0-beta1-14-g841e8077 \
  --build-arg LDFLAGS="-X github.com/kagent-dev/kagent/go/core/internal/version.Version=v0.10.0-beta1-14-g841e8077 -X github.com/kagent-dev/kagent/go/core/internal/version.GitCommit=841e8077 -X github.com/kagent-dev/kagent/go/core/internal/version.BuildDate=$(date -u '+%Y-%m-%d') ${DIGEST_LDFLAGS}" \
  --build-arg DOCKER_REPO=kagent-dev/kagent \
  --build-arg DOCKER_REGISTRY=kind-registry:5000 \
  --build-arg BASE_IMAGE_REGISTRY=cgr.dev \
  --build-arg TOOLS_GO_VERSION=1.26.3 \
  --build-arg TOOLS_UV_VERSION=0.10.4 \
  --build-arg TOOLS_PYTHON_VERSION=3.13 \
  --build-arg TOOLS_NODE_VERSION=24 \
  --build-arg BUILD_PACKAGE=core/cmd/controller/main.go \
  -t kind-registry:5000/kagent-dev/kagent/controller:v0.10.0-beta1-14-g841e8077 \
  -f go/Dockerfile go/

4. Install kagent CRDs (with substrate CRDs enabled)

helm upgrade --install kagent-crds helm/kagent-crds \
  --namespace kagent --create-namespace \
  --kube-context kind-kagent --wait --timeout 3m \
  --set kmcp.enabled=true \
  --set substrate.enabled=true

5. Install kagent + substrate v0.0.8

Pulls the substrate subchart from oci://ghcr.io/kagent-dev/substrate/helm (the default) at the version pinned in go/go.mod. No local substrate build.

helm upgrade --install kagent helm/kagent \
  --namespace kagent --kube-context kind-kagent --wait --timeout 10m \
  --set registry=localhost:5001 \
  --set tag=v0.10.0-beta1-14-g841e8077 \
  --set imagePullPolicy=Always \
  --set providers.openAI.apiKey="$OPENAI_API_KEY" \
  --set providers.default=openAI \
  --set kmcp.enabled=true --set kmcp.image.tag=0.3.0 \
  --set substrate.enabled=true \
  --set-json 'substrate.atelet.extraArgs=["--localhost-registry-replacement=kind-registry:5000"]' \
  --set controller.substrate.enabled=true \
  --set controller.substrate.ateApiEndpoint=dns:///kagent-api.kagent.svc:443 \
  --set controller.substrate.ateApiInsecure=true \
  --set controller.substrate.atenetRouterURL=http://kagent-atenet-router.kagent.svc:80 \
  --set controller.substrate.ateApiServer.namespace=kagent \
  --set controller.substrate.ateApiServer.serviceAccount=kagent-ate-api-server \
  --set controller.substrate.workerPoolName=kagent-default \
  --set substrateWorkerPool.create=true \
  --set substrateWorkerPool.ateomImage=ghcr.io/kagent-dev/substrate/ateom-gvisor:v0.0.8 \
  --set substrateWorkerPool.sandboxClass=gvisor \
  --set grafana-mcp.enabled=false \
  --set observability-agent.enabled=false

Two overrides worth calling out:

  • substrate.atelet.extraArgs=[--localhost-registry-replacement=kind-registry:5000]
    Atelet uses go-containerregistry directly to pull the ActorTemplate container image (not containerd), so containerd's hosts.toml alias on the kind node doesn't apply to it. This flag tells atelet's puller: for any ref starting with localhost:*, rewrite the hostname to kind-registry:5000 (which resolves inside pods via cluster DNS on the kind docker network) AND parse it with name.Insecure so the fetch is plain HTTP. Both effects come from that single flag.

  • controller.substrate.ateApiServer.{namespace,serviceAccount}
    Substrate is deployed as a subchart of kagent, so substrate.fullname prefixes every resource with the release name. The ate-api-server SA ends up as kagent-ate-api-server in the kagent namespace (not the default ate-api-server in ate-system). Kagent's substrate-ate-api-rbac.yaml needs the correct namespace + SA to bind its secret-read Role for KAGENT_CONFIG_JSON env resolution during CallAteletRestore.

6. Apply the default gVisor SandboxConfig

Substrate ships the CRD but not a default SandboxConfig resource. Without one, ate-controller fails resume with no default SandboxConfig for class "gvisor":

kubectl --context kind-kagent apply -f - <<'SBX'
apiVersion: ate.dev/v1alpha1
kind: SandboxConfig
metadata:
  name: gvisor-default
spec:
  sandboxClass: gvisor
  default: true
  assets:
    amd64:
      runsc:
        url: "gs://gvisor/releases/release/20260622/x86_64/runsc"
        sha256: "f18a948bf9c8bbb54eb998549a3a8d719a1c7de2efbe8fdd2ff0ee5fecd06f19"
    arm64:
      runsc:
        url: "gs://gvisor/releases/release/20260622/aarch64/runsc"
        sha256: "62eee121f8c188e347c428acc96f111568ede3be37b906046b6f28bbe2cc40c0"
SBX

7. Verify with a SandboxAgent

kubectl --context kind-kagent apply -f - <<'SA'
apiVersion: kagent.dev/v1alpha2
kind: SandboxAgent
metadata:
  name: substrate-demo
  namespace: kagent
spec:
  type: Declarative
  description: Minimal SandboxAgent running on Agent Substrate (gVisor).
  declarative:
    runtime: go
    modelConfig: default-model-config
    systemMessage: |
      You are a helpful assistant running inside a gVisor-sandboxed actor
      on Agent Substrate. Answer concisely.
  substrate:
    workerPoolRef:
      name: kagent-default
SA

kubectl --context kind-kagent get sandboxagent -n kagent -w
# NAME             READY   ACCEPTED
# substrate-demo   True    True

8. Verify with an AgentHarness

kubectl --context kind-kagent apply -f - <<'AH'
apiVersion: kagent.dev/v1alpha2
kind: AgentHarness
metadata:
  name: harness-demo
  namespace: kagent
spec:
  backend: openclaw
  description: OpenClaw on Agent Substrate
  modelConfigRef: default-model-config
  substrate:
    workerPoolRef:
      name: kagent-default
AH

kubectl --context kind-kagent get agentharness -n kagent -w
# NAME           BACKEND    READY   ID   AGE
# harness-demo   openclaw   True         ...

9. Chat via the UI

kubectl --context kind-kagent port-forward -n kagent svc/kagent-ui 3000:8080
# open http://localhost:3000 → pick substrate-demo or harness-demo → send a message

Copilot AI review requested due to automatic review settings July 2, 2026 18:53
@github-actions github-actions Bot added the enhancement New feature or request label Jul 2, 2026
@jjamroga jjamroga force-pushed the jjamroga/bump-substrate-0.0.8 branch from ffc7635 to cb7e67d Compare July 2, 2026 18:56
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates kagent to work with substrate v0.0.8, adapting all actor lifecycle call sites to the new atespace-scoped identity model, updating atenet-router host header formatting, and improving local/helm dev ergonomics around substrate installation.

Changes:

  • Bump substrate dependency to v0.0.8 and refactor actor operations to use (atespace, actorID) / ActorRef.
  • Update atenet-router Host/DNS shape to include atespace as a DNS label, propagated through gateway + A2A transports.
  • Add helm/build/dev improvements (templated substrate chart repo, kind registry env overrides, optional insecure curl for digest resolution).

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/kind/setup-kind.sh Add env overrides for local registry name/port/scheme and adjust containerd hosts config + registry documentation.
scripts/controller-digest-ldflags.sh Add optional DIGEST_CURL_INSECURE to skip TLS verification when resolving image digests.
Makefile Add SUBSTRATE_REPO override and pass it through chart stamping.
helm/kagent/Chart-template.yaml Template substrate chart repository.
helm/kagent-crds/Chart-template.yaml Template substrate-crds chart repository.
go/go.sum Update substrate to v0.0.8 and refresh indirect deps.
go/go.mod Replace substrate to v0.0.8 and bump indirect deps.
go/core/pkg/sandboxbackend/substrate/openclaw.go Pass atespace through actor calls and include it in backend Handle; update ActorHost format.
go/core/pkg/sandboxbackend/substrate/openclaw_test.go Update ActorHost test for new DNS shape.
go/core/pkg/sandboxbackend/substrate/lifecycle_delete.go Delete golden actors using the reserved ate-golden atespace.
go/core/pkg/sandboxbackend/substrate/lifecycle_delete_test.go Update test client to the new ActorRef API and add atespace RPC stubs.
go/core/pkg/sandboxbackend/substrate/lifecycle_actortemplate.go Explicitly set SnapshotsConfig defaults to avoid reconcile drift loops.
go/core/pkg/sandboxbackend/substrate/gateway.go Require atespace and include it in router Host construction.
go/core/pkg/sandboxbackend/substrate/gateway_test.go Update gateway target tests for new atespace-aware host format and validations.
go/core/pkg/sandboxbackend/substrate/delete_actor.go Thread atespace through delete/suspend/resume paths.
go/core/pkg/sandboxbackend/substrate/delete_actor_test.go Update deleteActor test signature for atespace parameter.
go/core/pkg/sandboxbackend/substrate/client.go Switch to ActorRef in RPCs; add EnsureAtespace helper; update method signatures.
go/core/pkg/sandboxbackend/substrate/agentharness_actor.go Update AgentHarness actor ops for atespace-scoped identity and handles.
go/core/pkg/sandboxbackend/substrate/agent_lifecycle.go Set SnapshotsConfig defaults explicitly for SandboxAgent ActorTemplates.
go/core/pkg/sandboxbackend/substrate/agent_actor.go Update SandboxAgent actor ops for atespace-scoped identity and handles.
go/core/pkg/sandboxbackend/substrate/actor_reachability.go Thread atespace through reachability checks and router targeting.
go/core/pkg/sandboxbackend/substrate/actor_reachability_test.go Update tests for new Host/DNS shape and new atespace parameter.
go/core/pkg/sandboxbackend/async.go Extend backend Handle to carry Atespace for substrate backends.
go/core/internal/httpserver/handlers/substrate.go Add Atespace to /api/substrate/status actor entries.
go/core/internal/httpserver/handlers/agentharness_gateway.go Resolve gateway target using atespace and log it.
go/core/internal/httpserver/handlers/agentharness_gateway_test.go Update ACP proxy tests for atespace-in-host format.
go/core/internal/a2a/substrate_transport.go Update substrate round-tripper creation to include atespace.
go/core/internal/a2a/substrate_sandbox_transport.go Pass atespace into substrate round-tripper when proxying A2A via atenet-router.
go/api/httpapi/substrate.go Add atespace field to SubstrateActorEntry JSON model.
examples/substrate-openclaw/README.md Document atelet image pull args for kind/local registries and subchart prefixing.
Comments suppressed due to low confidence (1)

scripts/kind/setup-kind.sh:23

  • REG_SCHEME can be set to https, but when the registry container is created by this script it runs registry:2 with no TLS. If a user sets REG_SCHEME=https and the container doesn’t already exist, containerd will be configured to use HTTPS against an HTTP registry and pulls will fail. Consider rejecting non-http schemes when bootstrapping a new registry container (or force reg_scheme=http in that branch).
# Override REG_NAME / REG_PORT / REG_SCHEME to reuse an existing local registry
# (e.g. an HTTPS registry on another port) instead of creating a fresh kind-registry.
reg_name="${REG_NAME:-kind-registry}"
reg_port="${REG_PORT:-5001}"
reg_scheme="${REG_SCHEME:-http}"
if [ "$("${CONTAINER_RUNTIME}" inspect -f '{{.State.Running}}' "${reg_name}" 2>/dev/null || true)" != 'true' ]; then
  "${CONTAINER_RUNTIME}" run \
    -d --restart=always -p "127.0.0.1:${reg_port}:5000" --network bridge --name "${reg_name}" \
    registry:2
fi

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/kind/setup-kind.sh
Comment thread go/core/pkg/sandboxbackend/substrate/client.go
@jjamroga jjamroga force-pushed the jjamroga/bump-substrate-0.0.8 branch from cb7e67d to 692abdf Compare July 2, 2026 19:57
Adapts kagent for substrate v0.0.8's atespace-scoped ActorRef identity
model (rename of ActorId→ActorRef{Atespace,Name} on all actor RPCs). Maps
atespace 1:1 to the SandboxAgent/AgentHarness Kubernetes namespace, adds
an EnsureAtespace idempotent helper, and updates the atenet-router Host
header shape to include the atespace label.

Also fixes a pre-existing kagent bug that PR kagent-dev#2109's ActorTemplate spec
immutability change surfaced: SnapshotsConfig.{OnPause,OnCommit} were
left zero-value in kagent's desired spec but the API server defaults
them to "Full" on admission, causing apiequality.Semantic.DeepEqual to
report drift every reconcile and hot-loop delete/recreate the
ActorTemplate CR.

Verified end-to-end on colima+kind with substrate v0.0.8 published
charts: SandboxAgent (declarative Go) and AgentHarness (openclaw) both
reach Ready=True and chat round-trip works.

Signed-off-by: Jonathan Jamroga <jjamroga@gmail.com>
@jjamroga jjamroga force-pushed the jjamroga/bump-substrate-0.0.8 branch from 692abdf to d40449e Compare July 2, 2026 20:00
@jjamroga

jjamroga commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed both Copilot review comments in d40449e6:

  1. setup-kind.sh local-registry-hosting ConfigMap — good catch, that was a KEP-1755 violation. Restored host: localhost:${reg_port} (developer-machine reachable) and added hostFromClusterNetwork: ${reg_name}:${reg_internal_port} so consumers of the ConfigMap can still find the in-cluster address.

  2. EnsureAtespace unit tests — added TestEnsureAtespace in client_test.go with 4 cases: AlreadyExists is treated as success, plain success passes through, non-AlreadyExists gRPC errors propagate, and non-gRPC errors propagate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants