Skip to content

Use WMI to implement Volume API to reduce PowerShell overhead #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2025

Conversation

laozc
Copy link
Contributor

@laozc laozc commented Oct 22, 2024

/kind feature

What this PR does / why we need it:
This PR leverages of WMI interfaces to replace the PowerShell functions for Windows related storage operations,
which improve the overall performance of csi-proxy.

Ran some test with two Windows nodes setup, one with the CSI-proxy with PowerShell, while another one with the CSI-proxy using WMI.
The test creates two pods side by side and delete the resources from the cluster once they're running.

  • Kubernetes: version 1.31.4
  • CSI: vSphere CSI
  • CNI: Antrea
  • Windows Node Spec: 4 vCPU, 16GB memory

Here is the performance data with collected from the gRPC server metrics within 10 minutes time frame.

Operation Before (sec) After (sec) Before (Min) After (Max) Improvement
FormatVolume (1G disk) 5.85-8.85 2.90s 5.85 2.90 -101%
GetDiskNumberFromVolumeID 2.90 0.29-0.58 2.90 0.58 -400%
GetVolumeIDFromTargetPath 0.58-0.96 0.009 0.58 0.009 -6344%
GetVolumeStats 2.90 0.29 2.90 0.29 -900%
IsVolumeFormatted 2.90 0.0955 2.90 0.0955 -2937%
ListDiskIDs 2.9-5.85 0.585 2.90 0.585 -396%
ListVolumeOnDisk 2.9 0.29-0.585 2.90 0.585 -396%
MountVolume 2.9-5.85 0.00955-0.0955 2.90 0.09 -3122%
PartitionDisk (1G disk) 19.5 2.9 19.5 2.9 -572%
SetDiskState 2.9-5.70 0.0955-0.280 2.90 0.280 -936%
UnmountVolume 5.85-8.85 0.29-0.585 5.85 0.585 -900%

It's obviously shown that PowerShell takes around 2.9s to load in the environment while memory consumption could reach up to 200MB, which becomes the bottle neck for volume mount on Windows.
By reducing the PowerShell overhead, we were able to cut down the time cost significantly to method calls for all operations.

Which issue(s) this PR fixes:
Ref #193

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 22, 2024
@k8s-ci-robot k8s-ci-robot requested review from humblec and pohly October 22, 2024 03:54
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 22, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @laozc!

It looks like this is your first PR to kubernetes-csi/csi-proxy 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-proxy has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 22, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @laozc. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 22, 2024
@andyzhangx
Copy link
Member

thanks for the PR. @laozc have you tested the WMI based operation manually? does it work?

@laozc
Copy link
Contributor Author

laozc commented Oct 23, 2024

Yes. I verified it in my local environment and they work the same as PowerShell commands.
I'm still working on finalizing the PR with more testing and integrate it into CSI driver to get some result.

@mauriciopoppe
Copy link
Member

/cc @andyzhangx @mauriciopoppe
/uncc @humblec @pohly

@k8s-ci-robot k8s-ci-robot requested review from andyzhangx and mauriciopoppe and removed request for pohly and humblec October 23, 2024 15:02
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2024
@laozc laozc force-pushed the wmi branch 8 times, most recently from 7ee9449 to c55b906 Compare November 10, 2024 10:24
@laozc laozc marked this pull request as ready for review November 10, 2024 10:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2024
@andyzhangx
Copy link
Member

@laozc I tries to copy the cim folder into my Azure Disk CSI driver repo (for test): https://github.com/kubernetes-sigs/azuredisk-csi-driver/compare/master...andyzhangx:azuredisk-csi-driver:cim?expand=1, but I got following error, have you hit similar error in your CSI driver?

https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163

# sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
FAIL	sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim [setup failed]
package sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
	imports github.com/microsoft/wmi/pkg/wmiinstance
	imports golang.org/x/sys/windows: build constraints exclude all Go files in /home/runner/work/azuredisk-csi-driver/azuredisk-csi-driver/vendor/golang.org/x/sys/windows
# sigs.k[8](https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163#step:4:9)s.io/azuredisk-csi-driver/pkg/os/cim
package sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
	imports github.com/microsoft/wmi/server201[9](https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163#step:4:10)/root/microsoft/windows/storage
	imports github.com/microsoft/wmi/pkg/base/instance
	imports github.com/microsoft/wmi/pkg/base/session: build constraints exclude all Go files in /home/runner/work/azuredisk-csi-driver/azuredisk-csi-driver/vendor/github.com/microsoft/wmi/pkg/base/session

@laozc
Copy link
Contributor Author

laozc commented Apr 15, 2025

@laozc I tries to copy the cim folder into my Azure Disk CSI driver repo (for test): https://github.com/kubernetes-sigs/azuredisk-csi-driver/compare/master...andyzhangx:azuredisk-csi-driver:cim?expand=1, but I got following error, have you hit similar error in your CSI driver?

https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163

# sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
FAIL	sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim [setup failed]
package sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
	imports github.com/microsoft/wmi/pkg/wmiinstance
	imports golang.org/x/sys/windows: build constraints exclude all Go files in /home/runner/work/azuredisk-csi-driver/azuredisk-csi-driver/vendor/golang.org/x/sys/windows
# sigs.k[8](https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163#step:4:9)s.io/azuredisk-csi-driver/pkg/os/cim
package sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim
	imports github.com/microsoft/wmi/server201[9](https://github.com/andyzhangx/azuredisk-csi-driver/actions/runs/14463170910/job/40559504163#step:4:10)/root/microsoft/windows/storage
	imports github.com/microsoft/wmi/pkg/base/instance
	imports github.com/microsoft/wmi/pkg/base/session: build constraints exclude all Go files in /home/runner/work/azuredisk-csi-driver/azuredisk-csi-driver/vendor/github.com/microsoft/wmi/pkg/base/session

Hi, these Go files in github.com/microsoft/wmi/pkg/base/instance and golang.org/x/sys/windows are set with a build flat to only work under GOOS=windows so they should be excluded from UT.
I don't think the code to call WMI pkg/os/cim could be tested in UT stage as they're wrapping WMI.
The E2E cases should cover the functionality.

You may either
go -covermode=atomic -coverprofile=profile.cov $(go list ./... | grep -v sigs.k8s.io/azuredisk-csi-driver/pkg/os/cim) to exclude these package
or either add

//go:build windows
// +build windows

to each file in this package so the test coverage is skipped for these files on Linux.

@laozc
Copy link
Contributor Author

laozc commented Apr 15, 2025

I could add build constraints to these codes if the downstream projects need.

//go:build windows
// +build windows

@andyzhangx
Copy link
Member

//go:build windows
// +build windows

yes, that's necessary, and pls also add header, you could refer to
kubernetes-sigs/azuredisk-csi-driver@d442711

@laozc laozc force-pushed the wmi branch 2 times, most recently from ca57b91 to dd7bed2 Compare April 18, 2025 13:50
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
I am trying the new cim calls

@andyzhangx
Copy link
Member

@laozc do you have the link about how this WMI improves the performance? pls paste details in this PR description, thanks.

@laozc
Copy link
Contributor Author

laozc commented Apr 21, 2025

@laozc do you have the link about how this WMI improves the performance? pls paste details in this PR description, thanks.

I updated the PR description to include my benchmark result mentioned in #360 (comment)

@laozc
Copy link
Contributor Author

laozc commented Apr 21, 2025

@andyzhangx Good to see the change works well on azuredisk-csi-driver.
Could you check if there is any other change needed on this PR?
I will rebase other PRs later.
The current review order is
Volume (#360) => Disk (#376) => System (#375) => iSCSI (#377) => SMB (#378)
after breaking down the original PRs into per-API basis.

@andyzhangx
Copy link
Member

@andyzhangx Good to see the change works well on azuredisk-csi-driver. Could you check if there is any other change needed on this PR? I will rebase other PRs later. The current review order is Volume (#360) => Disk (#376) => System (#375) => iSCSI (#377) => SMB (#378) after breaking down the original PRs into per-API basis.

@laozc do you have any debug tips about how to call WMI on Windows node? we use powershell command on windows node, that's quite easier in debugging issues.

@laozc
Copy link
Contributor Author

laozc commented Apr 21, 2025

@andyzhangx Good to see the change works well on azuredisk-csi-driver. Could you check if there is any other change needed on this PR? I will rebase other PRs later. The current review order is Volume (#360) => Disk (#376) => System (#375) => iSCSI (#377) => SMB (#378) after breaking down the original PRs into per-API basis.

@laozc do you have any debug tips about how to call WMI on Windows node? we use powershell command on windows node, that's quite easier in debugging issues.

@andyzhangx You can still use PowerShell, with a different cmdlet to query from CIM.
For example, if you want to get the detail of the first volume on the node

PS > (Get-CimInstance  -Namespace "Root\Microsoft\Windows\Storage" -Query "SELECT * FROM MSFT_Volume")[0]


ObjectId             : {1}\\WIN-8E2EVAQ9QSB\root/Microsoft/Windows/Storage/Providers_v2\WSP_Volume.ObjectId=
                       "{b65bb3cd-da86-11ee-854b-806e6f6e6963}:VO:\\?\Volume{1781d1eb-2c0a-47ed-987f-c229b9c
                       02527}\"
PassThroughClass     :
PassThroughIds       :
PassThroughNamespace :
PassThroughServer    :
UniqueId             : \\?\Volume{1781d1eb-2c0a-47ed-987f-c229b9c02527}\
AllocationUnitSize   : 4096
DedupMode            : 4
DriveLetter          : C
DriveType            : 3
FileSystem           : NTFS
FileSystemLabel      :
FileSystemType       : 14
HealthStatus         : 1
OperationalStatus    : {53261}
Path                 : \\?\Volume{1781d1eb-2c0a-47ed-987f-c229b9c02527}\
Size                 : 536198770688
SizeRemaining        : 407553982464
PSComputerName       :

The query is the same as what this PR calls in the implementation of QueryVolumeByUniqueID.
You can get the class MSFT_Volume from https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/msft-volume which lists all the properties and methods available.
PowerShell wraps the WMI class, so you may also use it like normal PowerShell data object.

If you need to call static method, you may do this like this.

PS > $class = Get-CimClass -ClassName MSFT_iSCSITarget -Namespace "Root\Microsoft\Windows\Storage"
PS > Invoke-CimMethod -CimClass $class -MethodName "Connect" @{NodeAddress="iqn.1998-01.com.vmware:52eb662431c68fa9-2792e31529536a3a";AuthenticationType="MUTUALCHAP";ChapSecret="--";ChapUsername="zhongchengl";}

You could find the corresponding WMI query and invocation except for the case with some direct Win32 API calls in this PR.
I don't think it makes a lot of differences to the non-WMI approach.

I believe the non-WMI approach works as a human admin/user or integration test cases. For a Go-based program CSI-proxy running on Windows nodes, it makes less sense to use PowerShell to automate these tasks, especially considering the overhead which .NET may bring.

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, laozc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2025
@k8s-ci-robot k8s-ci-robot merged commit f9750e0 into kubernetes-csi:master Apr 22, 2025
8 checks passed
@laozc laozc deleted the wmi branch April 22, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants