Skip to content

CLOUDP-377831: Rewrite prepare local e2e run script#773

Open
Julien-Ben wants to merge 15 commits intomasterfrom
CLOUDP-377831_rewrite-prepare-local-e2e-run
Open

CLOUDP-377831: Rewrite prepare local e2e run script#773
Julien-Ben wants to merge 15 commits intomasterfrom
CLOUDP-377831_rewrite-prepare-local-e2e-run

Conversation

@Julien-Ben
Copy link
Collaborator

@Julien-Ben Julien-Ben commented Feb 11, 2026

Summary

The local e2e preparation script (prepare_local_e2e_run.sh) was taking ~45-51s on multi-cluster setups, mostly due to sequential kubectl subprocess spawning. This PR brings it down to ~15-19s.

Every local e2e run starts with this script, so saving 30s in each iteration adds up fast during development.

What changed

Go rewrite of multi-cluster prep: The biggest win. Replaced the bash functions prepare_multi_cluster_e2e_run and configure_multi_cluster_environment with a Go binary (scripts/dev/prepare-multi-cluster/main.go) that uses client-go directly instead of shelling out to kubectl. Operations across clusters run in parallel via goroutines. This alone took the multi-cluster prep step from ~11s to ~1.4s.

The bash version used a helm chart to template RBAC resources (a ServiceAccount and a ClusterRoleBinding). The Go version creates these directly via the typed API with idempotent get-or-create-or-update logic. The helm chart is unchanged and still used by CI; only the local dev path bypasses it.

Bash parallelization: make install and delete_om_projects.sh are backgrounded early in the script and waited on before the deploy step.

Redundant work elimination: create_image_registries_secret was called twice per run (once in configure_container_auth.sh, again in configure_operator.sh). Gated the second call behind RUNNING_IN_EVG so it only runs in CI. Similarly, my-project/my-credentials creation is skipped in local multi-cluster runs since the Go binary handles it.

Smaller wins: kubectl delete + create converted to kubectl apply everywhere. Makefile sentinel file skips CRD regeneration when api/*.go hasn't changed. go build -o bin/X instead of go run (build cache makes repeated runs instant).

CI impact

All changes are gated to only affect local runs. CI paths (e2e.sh, single_e2e.sh) are unchanged in behavior — RUNNING_IN_EVG guards ensure CI still creates pull secrets and config resources through the existing shell path. The Go binary is only invoked from prepare_local_e2e_run.sh.

Follow-ups

Two more changes can be considered to further reduce the total time (by ~3-5s):

  • Run the kubectl plugin multi cluster setup concurrently (via e.g a --parallel flag).
  • Further parallelize reset.go (not only across clusters, but also within each cluster).

Proof of Work

Manually ran different e2e tests in different variants to assert correctness. This script doesn't run in CI.

Performance

Ran locally against a 4-cluster KIND setup (1 central + 3 members), where the changes matter most. Timed 3 runs of each version, both from a cold start (no compilation cache, CRDs need regenerating) and warm (caches populated).

                     Cold start     Warm start
                     ----------     ----------
Before (sequential)  51.06s         44.77s
After  (all changes) 19.43s         15.36s

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@Julien-Ben Julien-Ben added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Feb 11, 2026
@github-actions
Copy link

github-actions bot commented Feb 11, 2026

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.7.1 Release Notes

Other Changes

  • Container images: Merged the init-database and init-appdb init container images into a single init-database image. The init-appdb image will no longer be published and does not affect existing deployments.

@Julien-Ben Julien-Ben requested a review from lsierant February 11, 2026 13:19
@Julien-Ben Julien-Ben marked this pull request as ready for review February 13, 2026 11:09
@Julien-Ben Julien-Ben requested a review from a team as a code owner February 13, 2026 11:09
@Julien-Ben Julien-Ben requested review from filipcirtog and removed request for lsierant February 13, 2026 11:09
@Julien-Ben Julien-Ben marked this pull request as draft February 13, 2026 11:10
@Julien-Ben Julien-Ben marked this pull request as ready for review February 13, 2026 11:10
@Julien-Ben Julien-Ben requested review from lsierant and nammn February 13, 2026 11:10
mtls:
mode: STRICT`, namespace)

cmd := exec.Command("kubectl", "--context", cluster, "-n", namespace, "apply", "-f", "-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the hack! 😅
let's use GroupVersionKind and Unstructured data with the yaml payload, essentially mimic kubectl apply but with the API server. We don't need to fallback to kubectl

Copy link
Collaborator Author

@Julien-Ben Julien-Ben Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did it initially to to avoid initializing a second (untyped) client, out of laziness.

But here is the fix: 318e91e

orgId: "${OM_ORGID:-}"
EOF

# Note: create_image_registries_secret is already called by configure_operator.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this comment is useful in a later time, assuming this is getting merges?

# We always create the image pull secret from the docker config.json which gives access to all necessary image repositories
create_image_registries_secret
# In local runs, configure_container_auth.sh already creates the pull secret in all clusters
# before this script runs. Skip the redundant call to avoid a ~0.9s delete+recreate cycle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure whether those comments make sense here rather than in the pr as inline comments. All of those changes are tight to prior implementation

# across all clusters. In CI (Evergreen), this script is the only place the pull secret is
# created (e2e.sh sources it directly without configure_container_auth.sh).
if [[ "${RUNNING_IN_EVG:-false}" == "true" ]]; then
create_image_registries_secret
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we rather update create_image_registries_secret to skip if the secret already exists? That removes the running in evg check

}

// Phase 7: Copy or build kubectl-mongodb binary
func setupKubectlMongodb(cfg config) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we could split the function into 2 subfunctions for readability:

  • checkExists()
  • buildFromSource()

basically your comments but as subfunctions.

}

// Write central_cluster name
writeConfigFile(cfg.multiClusterConfigDir, "central_cluster", secret.Data["central_cluster"], collectError)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we write to here? Do we even save anything in doing this concurrently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-changelog Use this label in Pull Request to not require new changelog entry file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants