Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expand-and-modify test type to ebs-scale-test #2330

Merged
merged 2 commits into from
Feb 10, 2025

Conversation

AndrewSirenko
Copy link
Contributor

@AndrewSirenko AndrewSirenko commented Feb 6, 2025

What type of PR is this?

/kind documentation

What is this PR about? / Why do we need it?

Add ebs-scale-test tool TEST_TYPE='expand-and-modify' to guard against regressions in how EBS CSI Driver handles concurrent volume modifications and/or expansions.

Creates $REPLICAS block volumes. Patches PVC capacity and VACName at rate of 5 PVCs per second. Ensures PVCs are expanded and modified before deleting them. Exercises ControllerExpandVolume & ControllerModifyVolume. Set MODIFY_ONLY or EXPAND_ONLY to 'true' to test solely volume modification/expansion.

Note: As of Feb 2025, EBS CSI Driver can handle high amounts of concurrent modifications expansions at a rate of 5 patches per second. However due to our current coalescing implementation and the current AWS EC2 ModifyVolume 6 hour cooldown, modifying AND expanding more than 1 volume per second may lead to errors.

How was this change tested?

❯ ./scale-test create; ./scale-test run; ./scale-test clean

See gist for full logs proof

Also tested that MODIFY_ONLY and EXPAND_ONLY env vars work.

Shortened version:

❯ export CLUSTER_TYPE='pre-allocated'
❯ export REPLICAS=10
❯ export TEST_TYPE=expand-and-modify
❯ ./scale-test create; ./scale-test run; ./scale-test clean
Cluster created...
Updating kubeconfig and restarting ebs-csi-controller Deployment
Updated context arn:aws:eks:us-west-2:208536326202:cluster/ebs-scale-pre-allocated in /Users/andsirey/.kube/config
deployment.apps/ebs-csi-controller restarted
Waiting for deployment "ebs-csi-controller" rollout to finish: 0 of 1 updated replicas are available...
deployment "ebs-csi-controller" successfully rolled out
Applying /Volumes/workplace/aws-ebs-csi-driver/hack/ebs-scale-test/helpers/scale-test/expand-and-modify-test/expand-and-modify.yaml. Exported to /tmp/ebs-scale-test/ebs-scale-pre-allocated-expand-and-modify-10-2025-02-06T20:01UTC/expand-and-modify.yaml
storageclass.storage.k8s.io/ebs-scale-test-expand-and-modify created
volumeattributesclass.storage.k8s.io/ebs-scale-test-expand-and-modify created
persistentvolumeclaim/ebs-claim-0 created
....
Wait for all PVCs to be bound
Patching PVCs with [{"op": "replace", "path": "/spec/volumeAttributesClassName", "value": "ebs-scale-test-expand-and-modify"},{"op": "replace", "path": "/spec/resources/requests/storage", "value": "2Gi"}]
persistentvolumeclaim/ebs-claim-0 patched
....
Waiting until volumes modified and/or expanded
0/10 volumes modified
10/10 volumes modified
All volumes modified
6/10 volumes expanded
10/10 volumes expanded
All volumes expanded
Deleting resources
storageclass.storage.k8s.io "ebs-scale-test-expand-and-modify" deleted
volumeattributesclass.storage.k8s.io "ebs-scale-test-expand-and-modify" deleted
persistentvolumeclaim "ebs-claim-0" deleted
...
persistentvolumeclaim "ebs-claim-9" deleted
Waiting for all PVs to be deleted
No resources found
No PVs exist in the cluster, proceeding...
Port-forwarding ebs-csi-controller containers
...
Deleting cluster
...
2025-02-06 15:13:52 [✔]  all cluster resources were deleted

Does this PR introduce a user-facing change?

Add volume expansion & modification scalability test type to `ebs-scale-test` tool

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 6, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

Code Coverage Diff

This PR does not change the code coverage

parameters:
type: gp3
---
{{- range $index, $value := seq 1 .Env.REPLICAS }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to rely on Immediately bound PVCs instead of scaling statefulset (as we do in the scale-sts test).

Pros:

  • Separation of concerns between scale-sts test and expand-and-modify test
  • Order of magnitude shorter test: no waiting for all volumes to attach/detach + creating karpenter instances.
  • Two way door. Changing this test to scale sts is trivial.

Cons:

  • We do not excercise NodeExpandVolume at scale. (Though to me this sounds like a kubelet, AMI, resize2fs testing responsibility. We already have e2e tests)

Discussed this offline with @torredil, who lightly agreed with me.

ElijahQuinones

This comment was marked as resolved.

Copy link
Member

@ElijahQuinones ElijahQuinones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will lgtm once other comments are addressed

persistentvolumeclaim/ebs-claim-2 patched
persistentvolumeclaim/ebs-claim-1 patched
persistentvolumeclaim/ebs-claim-0 patched
persistentvolumeclaim/ebs-claim-3 patched
persistentvolumeclaim/ebs-claim-4 patched
persistentvolumeclaim/ebs-claim-6 patched
persistentvolumeclaim/ebs-claim-5 patched
persistentvolumeclaim/ebs-claim-7 patched
persistentvolumeclaim/ebs-claim-8 patched
persistentvolumeclaim/ebs-claim-9 patched
Waiting until volumes modified and/or expanded
0/10 volumes modified
10/10 volumes modified
All volumes modified
8/10 volumes expanded
10/10 volumes expanded
All volumes expanded
Deleting resources

verified all resources where delete properly.

Copy link
Member

@ElijahQuinones ElijahQuinones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2025
Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2025
@k8s-ci-robot k8s-ci-robot merged commit 4f4c7b7 into kubernetes-sigs:master Feb 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants