Skip to content

[JBRes-9717] Allow users to specify more specs for pods#160

Open
vpoliakov-pixel wants to merge 4 commits into
mainfrom
vladimir.poliakov/jbres-9717-pod-spec
Open

[JBRes-9717] Allow users to specify more specs for pods#160
vpoliakov-pixel wants to merge 4 commits into
mainfrom
vladimir.poliakov/jbres-9717-pod-spec

Conversation

@vpoliakov-pixel

Copy link
Copy Markdown
Contributor

Why

Today StartServerRequest only lets callers customize a narrow slice of the server pod: resources (cpu/mem/ephemeral-storage), node_selector, runtime_class_name, and run_as_root. Pipelines that run agents routinely need to mount secrets and volumes (credentials, configmaps, shared data) and occasionally other pod-level knobs.

What

Adds a hybrid customization surface to StartServerRequest, threaded end to end (client → orchestrator → deploy_serverV1PodSpec):

  • Typed, validated fields for the common cases: volumes, volume_mounts, env_from, service_account_name.
  • A generic pod_overrides escape hatch (partial V1PodSpec, deep-merged) for everything else — tolerations, hostAliases, dnsConfig, pod-level securityContext, sidecars, …

All k8s-shaped inputs use native camelCase manifest shape (secretName, mountPath, readOnly) so users can paste straight from k8s docs.

How it works

  • deserialize_k8s (new helper in kubernetes_client.py) converts camelCase dicts to V1* models via the kubernetes client's own deserializer, reusing the cached ApiClient (no extra aiohttp session).
  • pod_overrides is layered last via a sanitize_for_serialization → deep_merge(concat_lists=True) → deserialize roundtrip: scalars override, list fields concatenate, so the managed server container survives and node-pool tolerations / regcred pull-secret are preserved. Can add sidecars; can't replace server.
  • deep_merge added to common-utils/.../dict.py as a reusable utility.
  • ServiceAccount precedence: normal start/restore → caller SA wins; snapshot preparation → snapshot SA only (it carries snapshot permissions). New runtime-only fields are intentionally excluded from the snapshot-reuse hash.

Example

await client.start_server(
    image_tag="my-env:latest",
    volumes=[{"name": "agent-creds", "secret": {"secretName": "agent-creds"}}],
    volume_mounts=[{"name": "agent-creds", "mountPath": "/etc/creds", "readOnly": True}],
    env_from=[{"secretRef": {"name": "agent-creds"}}],
    service_account_name="agent-runner",
    pod_overrides={"tolerations": [{"key": "dedicated", "operator": "Exists", "effect": "NoSchedule"}]},
)

Tests

  • Unit (uv run pytest -m unit — all green, 521 passed):
    • test_dict.pydeep_merge (recurse, list concat vs replace, scalar override, no input mutation).
    • test_kubernetes_client.py (new) — deploy_server builds the pod spec with volumes/volume_mounts/env_from, and pod_overrides merges pod-level fields / concatenates lists / keeps the managed container / can add a sidecar; plus StartServerRequest field defaults & camelCase acceptance.
  • E2E (uv run pytest -m e2e, needs a cluster — added, not run in this PR): secret volume mount + env_from, service_account_name, and pod_overrides hostAliases → /etc/hosts.

Notes

  • Validation of the k8s shapes is delegated to the API server (the client deserializer silently drops unknown keys); typed fields are the safe, discoverable path and pod_overrides is the escape hatch.
  • New fields are additive optionals, so stored-request (de)serialization and snapshot reuse-matching are unaffected.

…rom, SA, pod_overrides)

StartServerRequest previously exposed only resources and node_selector. Pipelines
running agents commonly need to mount secrets/volumes and tweak other pod-level
fields, so add a hybrid customization surface threaded end to end
(client -> orchestrator -> deploy_server -> V1PodSpec):

- Typed, validated fields: volumes, volume_mounts, env_from, service_account_name
- Generic pod_overrides escape hatch (partial V1PodSpec) for everything else

All k8s-shaped inputs use native camelCase manifest shape and are converted to V1*
models via the kubernetes client's own deserializer (new deserialize_k8s helper that
reuses the cached ApiClient). pod_overrides is layered last via a
sanitize -> deep_merge(concat_lists) -> deserialize roundtrip, so the managed server
container survives and list fields (tolerations, volumes, pull secrets) are
concatenated rather than replaced.

ServiceAccount precedence: caller SA wins on normal start/restore; snapshot
preparation always uses the snapshot SA. New runtime-only fields are intentionally
excluded from the snapshot-reuse hash.

Adds deep_merge to common-utils, unit tests for deep_merge and deploy_server's pod
spec, and e2e tests for secret volume mounts, env_from, service_account_name, and
pod_overrides.
Replace the loosely-typed list[dict[str, Any]] / dict[str, Any] fields with typed,
native-Kubernetes-shaped Pydantic models for IDE support and strict typing, following
the existing KubernetesResources precedent:

- New api/pod_spec.py: KubernetesVolume (+ secret/configMap/emptyDir/pvc sources),
  KubernetesVolumeMount, KubernetesEnvFromSource (+ secret/configMap refs), and
  KubernetesPodOverrides (+ Toleration/HostAlias). Fields are snake_case and
  (de)serialize to/from camelCase via a to_camel alias generator; extra="allow" forwards
  unmodeled fields (csi, projected, affinity, ...) verbatim so the long tail still works.

StartServerRequest and the client now use these types; the orchestrator dumps them to
camelCase dicts (model_dump(by_alias=True, exclude_none=True)) before deploy_server,
exactly as resources is already handled. deploy_server stays dict-based (k8s adapter).

Adds unit tests for the models and updates the existing unit/e2e tests to the typed API.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands StartServerRequest’s Kubernetes pod customization surface so callers can mount volumes/secrets, import env via envFrom, set a ServiceAccount, and apply a deep-merged pod_overrides escape hatch that’s threaded from client → orchestrator → deploy_serverV1PodSpec.

Changes:

  • Add typed “native Kubernetes shape” models (idegym.api.pod_spec) and new StartServerRequest fields: volumes, volume_mounts, env_from, service_account_name, pod_overrides.
  • Implement Kubernetes camelCase deserialization + deep-merge override application in backend-utils deploy_server.
  • Add unit + e2e tests and supporting e2e Kubernetes helpers (secret + ServiceAccount lifecycle).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
unit-tests/test_pod_spec_models.py Unit tests for new typed pod-spec models and camelCase round-tripping.
unit-tests/test_kubernetes_client.py Unit tests verifying deploy_server applies typed pod fields and merges pod_overrides.
unit-tests/test_dict.py Adds unit tests for the new deep_merge helper.
orchestrator/src/idegym/orchestrator/snapshot_pipeline.py Threads new pod customization fields through snapshot pipeline deploy.
orchestrator/src/idegym/orchestrator/router/server.py Threads new pod customization fields through normal server start; adjusts SA precedence logic.
e2e-tests/utils/k8s_client.py Adds e2e helpers to create/delete Secrets and ServiceAccounts.
e2e-tests/test_pod_spec_overrides.py Adds e2e coverage for secret volume mounts, envFrom, ServiceAccount, and pod_overrides.
common-utils/src/idegym/utils/dict.py Introduces reusable deep_merge utility (with optional list concatenation).
client/src/idegym/client/operations/servers.py Exposes new pod customization parameters through client operations.
client/src/idegym/client/client.py Exposes new pod customization parameters on the public client APIs.
backend-utils/src/idegym/backend/utils/kubernetes_client.py Adds deserialize_k8s, applies volumes/mounts/envFrom, and deep-merges pod_overrides into V1PodSpec.
api/src/idegym/api/pod_spec.py Adds new typed “native Kubernetes shape” models with camelCase aliases and extra passthrough.
api/src/idegym/api/orchestrator/servers.py Extends StartServerRequest with new pod customization fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend-utils/src/idegym/backend/utils/kubernetes_client.py
Comment thread e2e-tests/utils/k8s_client.py
…r container)

Address review: pod_overrides was deep-merged after the managed fields, so a caller could
override serviceAccountName (bypassing the "snapshot SA wins" guarantee) or replace/drop the
managed "server" container (e.g. {"containers": null}).

deploy_server now, before merging pod_overrides:
- drops null values so a managed field can't be deleted by setting it to null;
- rejects serviceAccountName (it is owned by the service_account_name parameter);
- rejects a non-list containers or any override container named "server" (sidecars stay allowed).

Updates the pod_overrides API doc to state both invariants and adds unit tests.
… e2e helper

Address review: replace_namespaced_secret is an update and requires
metadata.resourceVersion for optimistic locking. On a 409 (secret already exists from a
prior run), read the existing secret and copy its resourceVersion before replacing, so the
create_secret helper is actually idempotent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants