Skip to content

Conversation

@gavinkflam
Copy link
Contributor

@gavinkflam gavinkflam commented Jun 20, 2025

HugeTLB stats for cgroup v2 is currently not working when rsvd usage is present. When the .rsvd.current file exists, the code attempts to read a .rsvd.events file to obtain failcnt stat. However, this file does not exist, resulting in the following error.

open /sys/fs/cgroup/pod_123.slice/pod_123-456.slice/hugetlb.2MB.rsvd.events: no such file or directory

I’ve verified in kernel source code that the cgroup v2 HugeTLB controller does not create a .rsvd.events file.

https://github.com/torvalds/linux/blob/v6.15/mm/hugetlb_cgroup.c#L711-L756

P.S. This is blocking cri-o/cri-o#9257

@gavinkflam gavinkflam force-pushed the fix-cg2-hugetlb-stat branch 2 times, most recently from e3d9c0a to b502476 Compare June 23, 2025 03:22
@gavinkflam
Copy link
Contributor Author

@AkihiroSuda @thaJeztah could you please take a look when you have a chance?

Tests are passing locally.

@thaJeztah thaJeztah requested a review from kolyshkin June 25, 2025 10:53
@lifubang
Copy link
Member

lifubang commented Jul 5, 2025

I’ve verified in kernel source code that the cgroup v2 HugeTLB controller does not create a .rsvd.events file.

You are right, I've checked in my machine.

@gavinkflam
Copy link
Contributor Author

hi @kolyshkin, @AkihiroSuda, @haircommander
could you ptal if you get a chance?

@gavinkflam gavinkflam force-pushed the fix-cg2-hugetlb-stat branch from b502476 to 5e787da Compare July 11, 2025 00:31
@gavinkflam gavinkflam requested review from kolyshkin and lifubang July 11, 2025 00:31
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, left a single nit.

Also, please change libct/cg/fs2: to fs2: in commit subject, as this is where it lives now (used to be part of runc's libcontainer/cgroups but now it's a separate repo).

I'm against these tests though, they are very non-real and do not actually test anything. I'd rather have something like a simple bats tests which checks that hugetlb is available, when runs runc events --stats $CTID and checks that hugetlb is not empty.

@gavinkflam gavinkflam force-pushed the fix-cg2-hugetlb-stat branch from 5e787da to c1ab7c5 Compare July 11, 2025 00:40
@gavinkflam gavinkflam changed the title libct/cg/fs2: Fix statHugeTlb error when rsvd usage is present fs2: Fix statHugeTlb error when rsvd usage is present Jul 11, 2025
@gavinkflam
Copy link
Contributor Author

gavinkflam commented Jul 11, 2025

Thank you, @kolyshkin. I've updated the code and reworded the commit message accordingly.

Regarding the tests, they are ported from fs/hugetlb_test.go, and other fs2 modules follow a similar testing pattern. I agree they could be more realistic, and what you described sounds like a solid direction.

Would you prefer that be addressed in this PR, or do you see it as a broader improvement for the library? Since there are currently no bats tests in the repo, introducing them would involve a fair amount of work on its own.

I'm also open to removing the tests and limiting this PR to the bug fix, given that this module didn't have any tests to begin with. What are your thoughts?

@kolyshkin
Copy link
Contributor

Yes, let's remove the test, and then I'll add one to runc repo.

@gavinkflam gavinkflam force-pushed the fix-cg2-hugetlb-stat branch from c1ab7c5 to 94067f2 Compare July 11, 2025 03:20
@gavinkflam
Copy link
Contributor Author

Thanks @kolyshkin, I've removed the tests. Please take another look.

@gavinkflam gavinkflam requested a review from kolyshkin July 11, 2025 03:21
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin @lifubang PTAL

Copy link
Member

@lifubang lifubang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM

@gavinkflam
Copy link
Contributor Author

@kolyshkin does it look good to you? shall we merge this please?

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit bb47798 into opencontainers:main Jul 14, 2025
14 checks passed
@gavinkflam gavinkflam deleted the fix-cg2-hugetlb-stat branch July 15, 2025 04:05
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2025
@kolyshkin
Copy link
Contributor

I'd rather have something like a simple bats tests which checks that hugetlb is available, when runs runc events --stats $CTID and checks that hugetlb is not empty.

Better late than never, see opencontainers/runc#4898

kolyshkin added a commit to kolyshkin/runc that referenced this pull request Oct 8, 2025
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants