Runtime Assisted Mount and Management enhancements #526

ddebroy · 2022-10-12T16:04:16Z

CSI spec enhancements to support deferral of file system mount and management operations from CSI plugins to container runtime handlers (detailed in kubernetes/enhancements#2893)

Introduce capabilities for CSI plugins to report support to defer file system mount and management operations to a container runtime handler
Introduce fields for a CO to report file system operations should be deferred (once CO determines CSI plugin and container runtime handler of workload are compatible and should be able to coordinate).

This does not specify how CSI plugins defer file system operations to a container runtime handler. Please see https://github.com/kubernetes/enhancements/pull/2893/files#diff-d29b488a06a8e4aa819829e0634c0affc40f4c4a35fa22738a7d02039ca50128R1281 for an outline of a common protocol that will allow CSI plugins and container runtime handlers to coordinate mount and management operations for file systems.

The main use-case of this right now is microvm based runtimes like Kata which benefit (from a perf and security perspective) when controlling the mounting of file systems on volumes (backed by block devices) rather than having the mount managed in the host and projected to the sandbox environment (using 9p or virtio-fs)

Existing CSI plugins in Kubernetes environments are achieving the above by performing API server lookups of the pod spec and associated runtime class (e.g. https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/ddef5e99b24da6e07a21184803bd37c16ade5686/pkg/disk/nodeserver.go#L241). This enhancement will allow the relevant information to be passed to the plugin without requiring the API server lookup of pod details.

Signed-off-by: Deep Debroy <[email protected]>

saad-ali · 2023-01-20T07:37:04Z

@jdef @bswartz Can you help review this?

bswartz

I'm finding the wording of the spec change confusing based on my high-level understanding of how this is supposed to work. Please correct my understanding if this is wrong:

SPs may support the ability to short-circuit the node operations (just do the non-FS operations and then stop) by advertising some new capabilities and receiving instructions from the CO on a per-attachment basis. The CO handles the coordination between the container runtime and the storage plugin. At no time does the SP communicate with any CRI.

The wording in the PR suggests that the SP does communicate with the CRI and I think that's unworkable. Is this a misunderstanding on my part or unclear wording that needs to be cleaned up?

ddebroy · 2023-01-20T20:16:31Z

@bswartz the overall flow for this will involve:

For NodePublish, SPs calling APIs on a common proxy (CRuSt proxy) to deposit metadata files about the file system operations for the volume in a known location on the host (that runtime handlers can process)
For NodeGetVolumeStats and NodeExpandVolume, SPs calling APIs on the common proxy (CRuSt proxy) to retrieve the data from a runtime handler.

Does the common proxy model work? If so, should I clarify the details about that in the CSI spec? The specifics about the proxy and how CSI plugins and runtimes will interact with it is described in the KEP here. Happy to incorporate a summary of it here if it is acceptable.

bswartz · 2023-01-20T21:20:23Z

@bswartz the overall flow for this will involve:
1. For NodePublish, SPs calling APIs on a common proxy ([CRuSt](https://github.com/kubernetes/enhancements/pull/2893/files#diff-d29b488a06a8e4aa819829e0634c0affc40f4c4a35fa22738a7d02039ca50128R1081) proxy) to deposit metadata files about the file system operations for the volume in a known location on the host (that runtime handlers can process)

2. For NodeGetVolumeStats and NodeExpandVolume, SPs calling APIs on the common proxy (CRuSt proxy) to retrieve the data from a runtime handler.
Does the common proxy model work? If so, should I clarify the details about that in the CSI spec? The specifics about the proxy and how CSI plugins and runtimes will interact with it is described in the KEP here. Happy to incorporate a summary of it here if it is acceptable.

Is it not possible to limit the communication that the SP has to do to just the CSI RPCs? I'm concerned that by mandating reliance on another standard, this feature ends up being limiting rather than empowering. For example, rather than the SP talking to CRuSt, couldn't the SP return the relevant info to the CO and then the CO could pass it to CRuSt? Such a model would enable other COs to do smarter things and possibly evolve beyond CRuSt. If we tie this feature to CRuSt and CRuSt stalls at some point in the future, then CSI is tied to a dead or dying thing. I'm thinking 5-10 years into the future here.

I realize that it might add to the complexity of the CSI RPCs to have special flavors of the relevant functions for the defered-FS calls, but extra RPCs are less of a liability than mandating usage of another standard. I've only started reading the KEP, but this is my main concern and I'm hoping there's a way to structure the layers so that CSI remains self-contained and the new interactions are pushed up into the CO (kubelet in this case).

ddebroy · 2023-01-20T21:37:47Z

rather than the SP talking to CRuSt, couldn't the SP return the relevant info to the CO and then the CO could pass it to CRuSt? Such a model would enable other COs to do smarter things and possibly evolve beyond CRuSt.

This is a great suggestion and makes complete sense! Thanks, @bswartz. Let me explore that and adapt this PR (and the KEP) to the suggested flow.

ameade · 2023-02-17T14:27:46Z

I almost had the same question as bswartz but looks like yall are thinking about it. I was thinking instead of the SP "deferring" the FS operations by calling a proxy that does it, perhaps it could be more like "skip" FS operations, then have the CO do the FS operations, perhaps by way of CRuSt. That might work since I assume the FS operations would be the last step and any of these operations anyways (NodeExpandVolume, NodeStageVolume, etc).

xing-yang · 2023-03-08T21:07:18Z

This file is missing: lib/go/csi/csi.pb.go
It should be generated automatically.

ddebroy force-pushed the ram1 branch 3 times, most recently from 9948b6e to 6f051a7 Compare October 12, 2022 20:14

ddebroy changed the title ~~Runtime Assisted Mount and Manamgent enhancements~~ Runtime Assisted Mount and Management enhancements Oct 31, 2022

ddebroy force-pushed the ram1 branch from 6f051a7 to 8937630 Compare November 9, 2022 16:44

Runtime Assisted Mount and Manamgent enhancements

1b83cca

Signed-off-by: Deep Debroy <[email protected]>

ddebroy force-pushed the ram1 branch from 8937630 to 1b83cca Compare November 9, 2022 16:46

bswartz requested changes Jan 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime Assisted Mount and Management enhancements #526

Runtime Assisted Mount and Management enhancements #526

ddebroy commented Oct 12, 2022 •

edited

Loading

saad-ali commented Jan 20, 2023

bswartz left a comment

ddebroy commented Jan 20, 2023 •

edited

Loading

bswartz commented Jan 20, 2023 •

edited

Loading

ddebroy commented Jan 20, 2023 •

edited

Loading

ameade commented Feb 17, 2023

xing-yang commented Mar 8, 2023

Runtime Assisted Mount and Management enhancements #526

Are you sure you want to change the base?

Runtime Assisted Mount and Management enhancements #526

Conversation

ddebroy commented Oct 12, 2022 • edited Loading

saad-ali commented Jan 20, 2023

bswartz left a comment

Choose a reason for hiding this comment

ddebroy commented Jan 20, 2023 • edited Loading

bswartz commented Jan 20, 2023 • edited Loading

ddebroy commented Jan 20, 2023 • edited Loading

ameade commented Feb 17, 2023

xing-yang commented Mar 8, 2023

ddebroy commented Oct 12, 2022 •

edited

Loading

ddebroy commented Jan 20, 2023 •

edited

Loading

bswartz commented Jan 20, 2023 •

edited

Loading

ddebroy commented Jan 20, 2023 •

edited

Loading