Update document about error handling #5462

gnufied · 2025-07-29T16:08:59Z

This PR documents error handling in external-resizer and how each type of error is handled.

gnufied · 2025-07-29T16:09:12Z

/assign @msau42 @jsafrane @xing-yang

msau42 · 2025-07-29T17:24:04Z

@sunnylovestiramisu @AndrewSirenko

keps/sig-storage/3751-volume-attributes-class/README.md

AndrewSirenko · 2025-07-29T18:39:43Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+In general Kubernetes sidecars classify all CSI errors in three different classes. Namely:
+
+- Non-final errors (such as `DeadlineExceeded`), which indicate a transient error, which may be because of timeout or some other temporary failure.


Would add to non-final errors that "CO is unsure if volume was modified"

Because CSI Operation CAN time out despite volume modification occurring in storage provider. E.g. storage providers where modifications may take a while

Kinda added extra wording. But not exactly what you wrote above. Please do check.

AndrewSirenko · 2025-07-29T20:59:06Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+#### Handling of infeasible errors
+
+If volume modification to a VAC is failing with a final and infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification. 


If volume modification to a VAC is failing with a final and infeasible error

I thought we ONLY cancel modification on infeasible err.

This is to prevent partial modification on final errs like InternalErr, which could lead to half-modified volumes for drivers.

I was also thinking we ONLY cancel modification(rollback) on Infeasible err, kubernetes-csi/external-resizer#487 is based on this assumption.

I think he's just stating that infeasible errors are a subset of final errors. All infeasible errors are final, but not all final errors are infeasible. This should probably say:

"failing with an infeasible error (but not other final errors),"

Yes I meant and to do some heavy lifting here. Since infeasible are already final, both conditions must be true. I will update the wording.

I fixed it. PTAL.

bswartz · 2025-07-29T23:10:51Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+#### Handling of infeasible errors
+
+If volume modification to a VAC is failing with a final and infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification. 


I think he's just stating that infeasible errors are a subset of final errors. All infeasible errors are final, but not all final errors are infeasible. This should probably say:

"failing with an infeasible error (but not other final errors),"

AndrewSirenko

/lgtm

huww98 · 2025-07-31T14:25:59Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed.
+
+It is possible that if there were one or more partial volume modifications that happened before on the volume, they will not be undone when this happens because for infeasible errors no `ControllerModifyVolume` will be called when user resets the VAC. This mechanism exists only to prevent perpetual call to `ControllerModifyVolume` for volume modifications which are never going to succeed. Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all `mutable_parameters`.


As a developer of CSI driver, and a cluster admin of our infra, I still cannot accept this.

they will not be undone

This means, when I specify my volume to have 2000 IOPS, and PVC.status tells me the reconcile finishes, but my volume may actually have only 1000 IOPS. And I can never observe the abnormal from Kubernetes API, until something more serious goes wrong:

If the performance is higher than expected, it will incur extra cost

If the performance is lower than expected, it can result in unexpected latency to workload, even catastrophic system failure

If we add topology integration to VAC, it also means then PV nodeAffinity can be out-of-sync, which will cause Pod pending or stuck due to scheduled to wrong node.

It is also subject to potential quota abuse

This mechanism exists only to prevent perpetual call

This does not make sense. After VAC is rolled back, if the volume is already at the desired state, SP should just return OK and do nothing. There is no reason the call will be perpetual. If the volume is actually partially modified, and cannot be rolled back by SP, it is better to let user notice this, rather than just hide it.

We never end the reconcile process with an failed gRPC call. e.g.

We only delete VolumeAttachment if ControllerUnpublishVolume returns OK.

We only clear PVC.Status.AllocatedResourceStatuses if ControllerExpandVolume returns OK.

So we should do the same, only clear PVC.Status.ModifyVolumeStatus if ControllerModifyVolume returns OK, and never cancel modification.

Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all mutable_parameters.

In Kubernetes, spec specifies the desired state, not action. Which ever state the user specifies, we should try to bring the underlying system to the specified state. It would be ridiculous if two VAC specifies the same state, but only one of them will work.

This change is being made in compliance with the CSI spec change container-storage-interface/spec#597. "Infeasible" in this KEP refers to the errors in the CSI spec such as NOT_FOUND and INVALID_ARGUMENT for which the SP must not have made any changes. The proposed sidecar behavior depends on the SP's spec compliance for safety in this case. If the SP returns a final non-infeasible error (such as INTERNAL) then the wording for final errors applies, and it's up to cluster operators to ensure that any new VAC will overwrite any partially applied settings from an older VAC.

See my comment at container-storage-interface/spec#597 (comment)
Basically, if some other errors happened before INVALID_ARGUMENT, we still not sure about the volume state.

We're taking the stance that, if that happens then the SP has violated the spec, either intentionally or through a bug. In either case, undefined behavior could occur. It is incumbent on the SP to prevent the case you describe (returning INVALID_ARGUMENT after making some changes) in order to remain spec complaint. We realize this is a tightening of the spec, which is why we want to do it while this RPC is still tagged alpha.

No, we are discussing the same case as you replied at container-storage-interface/spec#597 (comment)

I'd state the issue here again. Say initially the PVC has VAC A, and user modified it to B:

retry case

CO calls ControllerModifyVolume(B)

the request failed, maybe due to process restart, or network error, etc

CO retries ControllerModifyVolume(B)

SP return INVALID_ARGUMENT

With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO.

A -> B -> C case

CO calls ControllerModifyVolume(B)

the request failed with final error

User changed his mind and set VAC name to C

CO calls ControllerModifyVolume(C)

SP return INVALID_ARGUMENT

Similar to the previous case, SP can still make change between 1 and 2. Now if user try to revert to A, as proposed in this PR, "(partial volume modifications) will not be undone". So the volume will get into the state I described previously: the parameters of A is not applied, but user cannot detect this from Kubernetes API.

I'd state the issue here again. Say initially the PVC has VAC A, and user modified it to B:

retry case

1. CO calls ControllerModifyVolume(B) 2. the request failed, maybe due to process restart, or network error, etc 3. CO retries ControllerModifyVolume(B) 4. SP return INVALID_ARGUMENT

With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO.

If the SP returns INVALID_ARGUMENT in (4), then the there must have been some parameter in B which was invalid. What is invalid should not change over time. That's part of what we mean by infeasible -- it's that the passage of time won't change whether the request succeeds or not. Assuming the SP adheres to this logic, then we know nothing could have changed between 1 and 2 either, regardless of the error code returned.

A -> B -> C case

1. CO calls ControllerModifyVolume(B) 2. the request failed with final error 3. User changed his mind and set VAC name to C 4. CO calls ControllerModifyVolume(C) 5. SP return INVALID_ARGUMENT

Similar to the previous case, SP can still make change between 1 and 2. Now if user try to revert to A, as proposed in this PR, "(partial volume modifications) will not be undone". So the volume will get into the state I described previously: the parameters of A is not applied, but user cannot detect this from Kubernetes API.

In this case, C contained something the SP couldn't understand, and so the SP made no changes between (2) and (5). The actual state of the volume is somewhere between A and B, and the status reflects A. I'm acutally unsure which quota is consumed. I assume it's at least quota from A, and maybe also B and or C (although it can't be all 3).

In any case, this is an admin error, and undefined results are expected if an admin puts bad parameters into a VAC and then does this kind of manipulation. I'm not arguing that the results are perfect or desirable in this case, I'm arguing that it's safe from intentional exploitation (assuming the admin creates sane VACs and quotas), and that it's as good as we can do considering that nothing in the CO remembers the old state, so only forward progress is possible.

undefined results are expected if an admin puts bad parameters into a VAC

Whether the parameters are valid may depends on many aspects, such as the region/zone that the volume locate at, the current status of the volume, etc. It is not something admin can completely avoided in advance.

it's as good as we can do considering that nothing in the CO remembers the old state, so only forward progress is possible.

Please take a look at my proposal at https://docs.google.com/document/d/1VebyLSRcngn3_9wqaF5OUmxu60gwX7dN8euXRUqJSXA/edit?usp=sharing

Basically, I'm proposing to call ControllerModifyVolume(A) again in this case, to at least ensure the parameters explicitly specified for A are applied.

Assuming the SP adheres to this logic, then we know nothing could have changed between 1 and 2 either, regardless of the error code returned.

Yes, I agree that SP should do this. But assuming is not something we can rely on. Lets image that SP can do this:

changed, err := modify_part_1() if err != nil { ... } err = modify_part_2() if err != nil { if changed { return INTERNAL } else { return INVALID_PARAMETER } }

At first attempt, modify_part_1 succeeded, but modify_part_2 does not. at second attempt, SP would return INVALID_PARAMETER. I agree this is not recommended. But it is compliant to CSI spec.

"With the tightened spec, SP cannot have made any change between 3 and 4. But SP can still make change between 1 and 2, and not violating the tightened spec. So assuming the volume is not changed at step 4 is not safe for CO."

If 2 is a non final error, we do not let a new modification to C happen.

bswartz

/lgtm

bswartz · 2025-07-31T23:08:17Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed.
+
+It is possible that if there were one or more partial volume modifications that happened before on the volume, they will not be undone when this happens because for infeasible errors no `ControllerModifyVolume` will be called when user resets the VAC. This mechanism exists only to prevent perpetual call to `ControllerModifyVolume` for volume modifications which are never going to succeed. Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all `mutable_parameters`.


This change is being made in compliance with the CSI spec change container-storage-interface/spec#597. "Infeasible" in this KEP refers to the errors in the CSI spec such as NOT_FOUND and INVALID_ARGUMENT for which the SP must not have made any changes. The proposed sidecar behavior depends on the SP's spec compliance for safety in this case. If the SP returns a final non-infeasible error (such as INTERNAL) then the wording for final errors applies, and it's up to cluster operators to ensure that any new VAC will overwrite any partially applied settings from an older VAC.

k8s-ci-robot · 2025-07-31T23:08:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bswartz, gnufied
Once this PR has been reviewed and has the lgtm label, please ask for approval from jsafrane. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-storage/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-08-11T21:40:29Z

New changes are detected. LGTM label has been removed.

huww98 · 2025-08-17T07:29:13Z

Now this PR contains the strict-mode as we discussed in the meeting, trying to making sure the quota set by admin is effective at any situation. But current proposal still allows unrestricted usage if the rollback also failed. If we really add this to the enhancement doc, I'd like this to be addressed. I will try to add another strict-mode proposal to my PR tomorrow.

While strict-mode may be valuable, to be honest, I think we will not support such strict mode at Alibaba Cloud. Consider the "nil -> A" case in strict mode, where once user chose VAC A, it has to success before the user can do anything else. In our case, whether our ControllerModifyVolume can success depends on a lot of factors, e.g. zone, disk size, nodes attached, disk type before modification, etc. So even if the VAC itself is valid, user will easily choose a VAC that is invalid for his specific case at the first time. I think this is really frustrating to user, and will add tremendous workload to cluster admins and our support team to correct these errors.

And here is my reasoning about why we don't need strict-mode generally (agreeing with what @bswartz said at the meeting). Normally, cluster admin setup VACs and quotas. Users deploy workload and consume the quotas. Admin pays for the cluster so he setup quotas to ensure the budget is met. Users generally don't need to worry about the quota unless they exceeds it. Strict-mode would help admin to better control the budget and defense malicious users in the situations where SP somehow made partial modification to the volume and failed, user switched target to another VAC which also failed. If we don't have strict-mode, admin still have options to control the expense:

adjust VAC or ask SP to fix the partial modification issue.
inspect the PVCs stuck at error state for long time and try to fix them.
find out malicious users and ask them not to do this, assuming admin and users are normally in the same organization.

So I wonder how many vendors would really willing to support this strict mode? @gnufied I think you mentioned a use-case where the user is charged according to the quota they are allocated? I have some doubt about this because changing quota does not actually allocates any resources. Would user ever agree to be charged by quota? If only one vendor want this, would it make sense to implement this in a fork of external-resizer? (I believe many of the vendors already maintain a branch now, so do we).

sunnylovestiramisu · 2025-08-27T18:01:13Z

Since behavior has changed, we need to also update the k/k types.go description:

https://github.com/kubernetes/kubernetes/blob/6dff95db7983bed3fc821e0c57431114310a9ca4/pkg/apis/core/types.go#L565-L569

I am not sure if can also backport the comment change to 1.34?

gnufied · 2025-08-27T18:27:31Z

I am not sure if can also backport the comment change to 1.34?

That is a good point. I am afraid we will have to. cc @msau42

sunnylovestiramisu · 2025-09-02T18:47:50Z

keps/sig-storage/3751-volume-attributes-class/README.md

+
+##### Transition from nil-VAC to VAC(A)
+
+If volume modification to a VAC is failing with final but not-infeasible error, then external-resizer will keep trying to reconcile to VAC(A), regardless of any user initiated changes in `.spec.volumeAttributeClassName`. Only after transition to VAC(A) is successful, the user is allowed to move the PVC to a different VAC.


What if transition to VAC(A) is never successful but the final error is not-infeasible? Then it retries forever?

msau42 · 2025-09-03T17:58:27Z

I am not sure if can also backport the comment change to 1.34?

That is a good point. I am afraid we will have to. cc @msau42

What is the proposed change? IMO, calling controllermodifyvolume is an implementation detail. To the end user, it's still cancelling.

gnufied · 2025-09-04T16:11:47Z

What is the proposed change? IMO, calling controllermodifyvolume is an implementation detail. To the end user, it's still cancelling.

Yes it can be argued that way. I see arguments for both ways.

Anyways I am going to close this change since we agreed that we are going to accept #5482 as proposed change.

#5482 solves this problem by having more relaxed handling of quota issues with tradeoffs we discussed.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jul 29, 2025

k8s-ci-robot requested review from saad-ali and xing-yang July 29, 2025 16:09

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jul 29, 2025

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 29, 2025

k8s-ci-robot assigned jsafrane, msau42 and xing-yang Jul 29, 2025

gnufied mentioned this pull request Jul 29, 2025

Collapse the INVALID_ARGUMENTS error rows and clarify container-storage-interface/spec#597

Merged

gnufied force-pushed the update-vac-kep branch from 3ce4800 to a15211b Compare July 29, 2025 16:24

sunnylovestiramisu reviewed Jul 29, 2025

View reviewed changes

keps/sig-storage/3751-volume-attributes-class/README.md Outdated Show resolved Hide resolved

sunnylovestiramisu reviewed Jul 29, 2025

View reviewed changes

keps/sig-storage/3751-volume-attributes-class/README.md Outdated Show resolved Hide resolved

sunnylovestiramisu reviewed Jul 29, 2025

View reviewed changes

keps/sig-storage/3751-volume-attributes-class/README.md Outdated Show resolved Hide resolved

AndrewSirenko reviewed Jul 29, 2025

View reviewed changes

gnufied force-pushed the update-vac-kep branch from a15211b to 1495478 Compare July 29, 2025 20:14

AndrewSirenko reviewed Jul 29, 2025

View reviewed changes

bswartz suggested changes Jul 29, 2025

View reviewed changes

AndrewSirenko reviewed Jul 31, 2025

View reviewed changes

k8s-ci-robot assigned AndrewSirenko Jul 31, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2025

huww98 reviewed Jul 31, 2025

View reviewed changes

Update document about error handling

f533296

gnufied force-pushed the update-vac-kep branch from 1495478 to f533296 Compare July 31, 2025 16:04

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2025

bswartz approved these changes Jul 31, 2025

View reviewed changes

k8s-ci-robot assigned bswartz Jul 31, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2025

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2025

gnufied force-pushed the update-vac-kep branch 2 times, most recently from eb1caf5 to 7298e36 Compare August 12, 2025 14:44

Add new flow

f7f4412

gnufied force-pushed the update-vac-kep branch from 7298e36 to f7f4412 Compare August 12, 2025 15:03

huww98 mentioned this pull request Aug 16, 2025

KEP-3751: add error handling #5482

Open

huww98 mentioned this pull request Aug 17, 2025

KEP-3751: add error handling strict mode #5485

Open

sunnylovestiramisu reviewed Sep 2, 2025

View reviewed changes

gnufied closed this Sep 4, 2025


		In general Kubernetes sidecars classify all CSI errors in three different classes. Namely:

		- Non-final errors (such as `DeadlineExceeded`), which indicate a transient error, which may be because of timeout or some other temporary failure.


		#### Handling of infeasible errors

		If volume modification to a VAC is failing with a final and infeasible error, then users can either set VAC to previously specified value in `status.currentVolumeAttributesClass` or set to `nil` if no VAC was specified. In both the cases, external-resizer will stop trying to reconcile the volume modification.


		Please note if PVC already had a `currentVolumeAttributesClass` in its status, then setting VAC to `nil` is not allowed.

		It is possible that if there were one or more partial volume modifications that happened before on the volume, they will not be undone when this happens because for infeasible errors no `ControllerModifyVolume` will be called when user resets the VAC. This mechanism exists only to prevent perpetual call to `ControllerModifyVolume` for volume modifications which are never going to succeed. Storage providers and users are recommended to roll forward to different VAC, if desired behaviour is resetting the VAC to some pre-specified value for all `mutable_parameters`.


		##### Transition from nil-VAC to VAC(A)

		If volume modification to a VAC is failing with final but not-infeasible error, then external-resizer will keep trying to reconcile to VAC(A), regardless of any user initiated changes in `.spec.volumeAttributeClassName`. Only after transition to VAC(A) is successful, the user is allowed to move the PVC to a different VAC.

Update document about error handling #5462

Update document about error handling #5462

Conversation

gnufied commented Jul 29, 2025

Uh oh!

gnufied commented Jul 29, 2025

Uh oh!

msau42 commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndrewSirenko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huww98 Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

retry case

A -> B -> C case

Uh oh!

Choose a reason for hiding this comment

retry case

A -> B -> C case

Uh oh!

huww98 Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huww98 Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bswartz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jul 31, 2025

Uh oh!

k8s-ci-robot commented Aug 11, 2025

Uh oh!

huww98 commented Aug 17, 2025

Uh oh!

sunnylovestiramisu commented Aug 27, 2025

Uh oh!

gnufied commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msau42 commented Sep 3, 2025

Uh oh!

gnufied commented Sep 4, 2025

Uh oh!

huww98 Aug 2, 2025 •

edited

Loading

huww98 Aug 4, 2025 •

edited

Loading

huww98 Aug 4, 2025 •

edited

Loading