-
Notifications
You must be signed in to change notification settings - Fork 590
nvmeof: add qos ability #5714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvmeof: add qos ability #5714
Conversation
f889a3f to
6624497
Compare
| // Step 3: Get NVMe-oF metadata for the volume | ||
|
|
||
| // Since ControllerModifyVolume doesn't receive volume context and dont have option to take secrets | ||
| // because there is no "csi.storage.k8s.io/controller-modify-secret-name" field in the SC !, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ControllerModifyVolume can have secrets according to the specification: https://github.com/container-storage-interface/spec/blob/master/spec.md#controllermodifyvolume
Do you mean that the resizer sidecar does not pass the secret from Kubernetes to the CSI-driver? Is there an issue for that at https://github.com/kubernetes-csi/external-resizer , or can we provide a PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw it has that, but when I ran it the secret was nil, so I thought maybe I need to add into StorageClass
csi.storage.k8s.io/controller-modify-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-modify-secret-namespace: defaultlike I have for ControllerPublishVolume()
csi.storage.k8s.io/controller-publish-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-publish-secret-namespace: defaultbut what I see here:
https://kubernetes-csi.github.io/docs/secrets-and-credentials-storage-class.html#storageclass-secrets
there is no for modify. maybe I am wrong, but I could not find any place that mentioned that..
So, I implemented like ControllerUnPublishVolume() here:
| secretName, secretNamespace, err := util.GetControllerPublishSecretRef(req.GetVolumeId(), util.RBDType) |
About the resizer, I am not sure how it should pass the secret. I will look at it and see if it is unimplemented\issue or something else..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secrets are indeed not fetched, they are always nil:
I do not think it is possible to get secrets as part of the ControllerModifyVolume call, so these need to be resolved through an other way (like the new default credentials).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe kubernetes-csi/external-resizer#544 is acceptable, would you be able to test with that container-image?
| // For now it only updates the capacity in the response as NVMe-oF | ||
| // this must be added because ControllerModifyVolume requires the sidecar csi-resizer. and | ||
| // csi-resizer searches for the capacity ControllerServiceCapability_RPC_EXPAND_VOLUME. | ||
| // In the future, if NVMe-oF gateway supports volume expansion, the logic must be added here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a shortcoming of the external-resizer sidecar. Only ModifyVolume should be allowed. But there seems to be a required check on volume resize: https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/resizer/csi_resizer.go#L59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic i think we should open a tracker with the external-resizer as there could be drivers that support ModiyVolume but not Expand volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created kubernetes-csi/external-resizer#545 for that, but a draft for now as I have not tested it yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was merged. The ControllerExpandVolume procedure can be kept until the external-resizer has a release. (Or maybe by that time nvmeof implements resizing too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixpanic , Aviv told me we should implement the ControllerExpandVolume() (resize nvmeof volume). So, in any case we need this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,ControllerExpandVolume needs to be implemented too, but in a separate PR 😄
2ade41c to
57c22b1
Compare
57c22b1 to
359a34b
Compare
c49f033 to
4635779
Compare
internal/nvmeof/qos.go
Outdated
| RwIosPerSecond = "nvmeof_rw_ios_per_second" | ||
| RwMbytesPerSecond = "nvmeof_rw_mbytes_per_second" | ||
| RMbytesPerSecond = "nvmeof_r_mbytes_per_second" | ||
| WMbytesPerSecond = "nvmeof_w_mbytes_per_second" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that we do not have any parameters in the StorageClass that contain under_scores. Please use camelCase format for parameter keys. There is no need for the nvmeof prefix either, these parameters should only be documented for the NVMe-oF controller/provisioner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes lets use CephCSI specifc keys for user facing configuration and translate it internall to keep the keys standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok , I will use camelCase format.
FYI, these params will not be shown in the SC anymore, only in VAC yaml file.
I added an example in this PR here:
#5714 (comment)
I think if we have 2 different types of VACs , for rbd and for nvmeof (once the rbd will implement the modify).
the VAC has field driverName: nvmeof.csi.ceph.com, it means only PVCs that created by nvmeof driver can be affected by this VAC. BUT I can create VAC for "nvmeof PVC" that has rbd qos params.
So I must somehow distinguish between these vars.
I can remove the nvmeof prefix, to be e.g. RwIosPerSecond but it is very general, is not?
in this VAC I need to know what params are for nvmeof and what for rbd.
with the nvmeof prefix it will be clearner IMO , what do you think? @nixpanic , @Madhu-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gadididi lets take a step back, do we need to support RBD QOS for the rbd images created using nvmeof csi driver? what are the advantages of it? if there is no value we should not support it, just support QOS using nvmeof only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1, I did not see specific requirement for that. but I think if there is already implementation of QoS in the RBD controller, and there is an option to set rbd QoS on nvmeof volumes (via rbd cli on the rbd image), maybe is it good to have this option here (in nvmeof controller too)
UPDATE:
I was wrong about the RBD QoS
I asked my manager Aviv, and he said, it is not recommended to have both together (as I said before) but also it is not recommended to allow to the user to set QoS on rbd , when the user uses nvmeof volumes. the purpose of the nvmeof is expose higher level of setting
So, what I am going to do:
1. Parse mutable_parameters
2. Check if contains RBD QoS params?
- YES: Return INVALID_ARGUMENT (RBD QoS not supported for NVMe-oF volumes)
- NO: Continue
3. Check if contains NVMe-oF QoS params?
- NO: Return SUCCESS (no-op, nothing to modify)
- YES: Continue
4. Apply NVMe-oF QoS via gateway
5. If gateway error EXIST: means rbd QoS was set (not recommended for the user , but he did it ) return ErrorExistst
6. if gateway return another error : Return INTERNAL
7. else Return SUCCESSwhat do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove the nvmeof prefix, to be e.g. RwIosPerSecond but it is very general, is not?
Yes, the name of the parameter is general. But the StorageClass is for NVMe-oF, so it is still specific enough that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good and it also keep very minimal and exactly what we need to have
| // No QoS parameters - nothing to do | ||
| log.DebugLog(ctx, "No QoS parameters provided for volume %s", volumeID) | ||
|
|
||
| return &csi.ControllerModifyVolumeResponse{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:- If the values are unset or removed from the VAC, dont we need to remove these QOS from the volume as well. In other words how a user can remove QOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 , Removing volumeAttributesClassName doesn't trigger ControllerModifyVolume()
So, how to remove QoS (means set to unlimited all params), 2 options:
-
create removal_qos VAC with empty params. once I get it (in the ControllerModifyVolume()) by patching the new removal_qos VAC , I see no params at all - means set to unlimited.
-
create removal_qos VAC with 0 params to all 4 fields. once I get it (in the ControllerModifyVolume()) by patching the new removal_qos VAC , it calls as always to the grpc set_qos .
currently code support option 2. if I get empty VAC (no params) it finishes with no error
IMO option 2 is better. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing volumeAttributesClassName doesn't trigger ControllerModifyVolume()
Is this something supported? we need to test with kubernetes 1.34, If yes yes option 2 make sense to me. we need to document these behaviour to set the right expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create removal_qos VAC with 0 params to all 4 fields
Call the VAC unlimited or similar (in the documentation or examples), makes it easier for users to understand.
| return nil | ||
| } | ||
|
|
||
| if err := parseParam(nvmeof.RwIosPerSecond, nvmeof.RwIosPerSecond, &qos.RwIosPerSecond); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be the behavior if 0 is set for these keys. is it considered as infinite or disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is considered as infinite. but "disable QoS" means remove the limitation, so I think it is same terms
internal/nvmeof/qos.go
Outdated
| RwIosPerSecond = "nvmeof_rw_ios_per_second" | ||
| RwMbytesPerSecond = "nvmeof_rw_mbytes_per_second" | ||
| RMbytesPerSecond = "nvmeof_r_mbytes_per_second" | ||
| WMbytesPerSecond = "nvmeof_w_mbytes_per_second" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes lets use CephCSI specifc keys for user facing configuration and translate it internall to keep the keys standard
3717609 to
dd1172a
Compare
4f707dd to
9abf91b
Compare
Madhu-1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nits, LGTM
9abf91b to
e1f6bea
Compare
|
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You can check the last failing draft PR here: #5756. You may have to fix your CI before adding the pull request to the queue again. |
init that file. in the future add more nvmeof errors to this file. Signed-off-by: gadi-didi <[email protected]>
make getNVMeoFMetadata() returns nvmeof error codes. Signed-off-by: gadi-didi <[email protected]>
Qos for nvmeof namespace is added. allow the user limit the namesapce capabilities. Signed-off-by: gadi-didi <[email protected]>
added because want to reuse the private function parseQosParams() Signed-off-by: gadi-didi <[email protected]>
if qos params are provided, call to qos grpc after ns was created. Signed-off-by: gadi-didi <[email protected]>
the purpose of that function is to modify the qos for namesapce on the fly. Signed-off-by: gadi-didi <[email protected]>
modify_volume capability is added to nvmeof driver in order to call ControllerModifyVolume(). Signed-off-by: gadi-didi <[email protected]>
Add EXPAND_VOLUME capability and stub implementation to allow csi-resizer to start and handle VolumeAttributesClass modifications. Signed-off-by: gadi-didi <[email protected]>
add ControllerModifyVolume to request ID extraction for proper log correlation. Signed-off-by: gadi-didi <[email protected]>
add unit tests with multiple cases. Signed-off-by: gadi-didi <[email protected]>
e1f6bea to
f482113
Compare
Pull request has been modified.
|
@Mergifyio requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
|
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 0c9e11e |
Describe what this PR does
Adds QoS (Quality of Service) support for NVMe-oF CSI driver, allowing users to set IOPS and bandwidth limits on volumes at creation time and modify them dynamically at runtime.
Is there anything that requires special attention
New StorageClass parameters:
rwIosPerSecond- Read/Write IOPS limitrwMbytesPerSecond- Read/Write bandwidth limit (MB/s)rMbytesPerSecond- Read bandwidth limit (MB/s)wMbytesPerSecond- Write bandwidth limit (MB/s)Key decisions:
ControllerModifyVolume()RPC for runtime QoS changes via VolumeAttributesClass (GA at K8s 1.34+you can read more about it here: volume-attributes-classes)
ControllerExpandVolume()withEXPAND_VOLUMEcapability to satisfy csi-resizer requirements (csi-resizer needs this capability to start, even when only using it for modify operations)GetControllerPublishSecretRef()for credential retrieval inControllerModifyVolume()as a temporary solution (TODO: implementGetControllerExpandSecretRef()with proper NVMeoF support in ceph-csi-config)Credential handling note:
Since CSI spec doesn't define
csi.storage.k8s.io/controller-modify-secret-nameparameter and csi-resizer doesn't pass secrets toControllerModifyVolume()according to storageclass-secrets, . The implementation retrieves credentials from ceph-csi-config ConfigMap using the existingcontrollerPublishSecretReffield. This follows the same pattern as other CSI operations.Example workflow:
kubectl patch pvc my-pvc -p '{"spec":{"volumeAttributesClassName":"high-performance"}}'Result: Volume QoS updated to 50000 IOPS, 500 MB/s without recreation or downtime.
NOTE:
Related issues
#5694
Future concerns
controllerExpandSecretReffield OR AddcontrollerExpandSecretReffor the existingRBDstruct. (because NVMeoF is just a wrapper to RBD).GetControllerExpandSecretRef()to replace temporary use ofGetControllerPublishSecretRef()Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)