Skip to content

Speed up resource model on large pod lists (skip client-side deserialization)#201

Open
karldebisschop wants to merge 1 commit into
rundeck-plugins:masterfrom
karldebisschop:perf-skip-pod-list-deserialization
Open

Speed up resource model on large pod lists (skip client-side deserialization)#201
karldebisschop wants to merge 1 commit into
rundeck-plugins:masterfrom
karldebisschop:perf-skip-pod-list-deserialization

Conversation

@karldebisschop
Copy link
Copy Markdown
Contributor

Summary

On clusters with many pods, the resource-model source spends most of its
wall-clock time in the Kubernetes client deserializing the list response into
typed model objects. This PR speeds that up and adds two opt-in options, with
no change to default behavior.

Changes

  • Skip client-side deserialization (the main win). collect_pods_from_api
    now requests the raw response (_preload_content=False) and parses it with
    json.loads(resp.data).get('items', []) instead of building the client's
    per-object model tree, which dominates runtime on large pod lists.
    nodeCollectData and main read plain dicts (camelCase keys), and
    startedAt is reformatted from its RFC 3339 string.

  • Parse per-run config once. main parses tags, mappings, defaults, the
    emoticon flag, and the config file a single time and passes them to
    nodeCollectData, instead of re-parsing the same config strings for every
    container.

  • Opt-in: Exclude Namespaces (RD_CONFIG_EXCLUDE_NAMESPACES). A
    comma-separated list of namespaces to exclude server-side via the field
    selector (only when no specific Namespace is set). Defaults to empty, so
    nothing is excluded unless configured.

  • Opt-in: Use API Cache (RD_CONFIG_USE_CACHE). When enabled, passes
    resource_version=0 so the apiserver serves the list from its in-memory
    watch cache rather than a quorum read from etcd — faster on large clusters
    and lighter on the control plane, at the cost of possibly-stale data.
    Defaults to off (strong reads, unchanged).

  • Compact output. Drops indent from the JSON output (kept sort_keys
    for stable ordering), trimming output size.

Compatibility

Default behavior is unchanged: the same node attributes are produced, the
running/terminated filtering is identical, and both new options are off by
default. The unit tests were updated to build dict-based pod fixtures matching
the raw API JSON, with added coverage for the namespace-exclusion and
API-cache paths; the full suite passes.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

- Parse the pod list from the raw API response (_preload_content=False +
  json.loads) instead of building the client's typed model objects, whose
  per-object deserialization dominates wall-clock time on large pod lists.
  nodeCollectData and main read plain dicts (camelCase keys); startedAt is
  reformatted from its RFC 3339 string.
- Parse per-run config (tags, mappings, defaults, emoticon, config file) once
  in main and pass it to nodeCollectData instead of re-parsing it per node.
- Add opt-in Exclude Namespaces (RD_CONFIG_EXCLUDE_NAMESPACES): exclude
  namespaces server-side via the field selector. Default empty (no change).
- Add opt-in Use API Cache (RD_CONFIG_USE_CACHE): pass resource_version=0 so
  the apiserver serves the list from its watch cache. Default off.
- Emit compact JSON (drop indent; keep sort_keys for stable output).

Tests updated to dict-based pod fixtures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant