-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot make nsjail work on cgroupsv2 system #196
Comments
Can you show your |
It is in a docker container. What needs to be different? |
The issue persists even if we bind the cgroup mount point to the container:
With the mount nsjail can write to memory.max, but can't move process to the created group. To be fair, the issue occurs also outside of nsjail. Moving process to the new cgroup manually doesn't seem to work in privileged docker container. As expected it does work outside the container. Do you know why is that and how to overcome this? What permissions are needed to move a process to cgroup? |
EDIT: below you can see some diagnosis of your issues, but I am wondering: is there any particular reason you want to use nsjail with cgroups v2 instead of v1? Docker enables lots of options that may influence whether you can or cannot do a certain operation and for example even if you use the Fwiw it is hard to diagnose your issues not having much details about what commands you executed or the environment you run this against. But anyway, lets try to help :). I have tried to reproduce your issues on my side on Ubuntu 21.04 and my first issue was that
On my side, this is because I have both cgroups v1 and v2 and v2 is mounted in a different path:
I was able to resolve this issue with the
But as we can see, now I am getting the error that @carlbordum was getting:
So what happens here? Well, while the cgroup v2 memory controller indeed does expose such file it does not exist on my side because... I don't have a memory cgroup v2 controllers enabled or even available! :( We can see that here, as according to this kernel documentation page the
But it shows nothing instead! So why is that? Why are there no cgroupv2 controllers available? If I understand correctly, this is related to what they write here:
It seems that a given controller may be bound either to v1 or to v2 but never to both of them. I guess this kinda makes sense, and just to recap, my memory controller is indeed bound to v1 as what my
So if you are in the same situation as me, I guess the easiest is to change kernel boot parameters and add |
@robertswiecki with what we see above, I guess we could improve nsjail UX by:
|
Wow, I am completely blown away be your helpfulness for such a poor "bug" report. I am specifically working on this little project. It is very reproducible, so I was confused about why it stopped working, but I think its because my systems now run the cgroupv2 controller. Is there any decent way to run edit: if you want, you can clone the project and run |
@disconnect3d thank you for such cool in-depth analysis. I'm vaguely familiar with cgroups2 myself, but I guess I can take a look at what can be improved here. Though, if anyone will beat me to that, I won't complain :) |
We're also seeing issues in trying to get nsjail running for Compiler Explorer on newer cgroupss (on Ubuntu 22.04):
(or with
is what we see, which may be similar. We wouldn't choose to run cgroups2 but Ubuntu 22.04 seems to have made it the default, and it's easier not to special case boot params to get it back to the old system. |
In my case:
so it doesn't quite seem the same issue as others have seen, though I'm having trouble with the That said: with a bit more hacking and fiddling with settings I supplied a different cgroupts2 mount (including the
and ...
so it looks like I ought to be able to write to the file. |
At least for my use-case (running I worked some on a fix for this in my fork. All we really need to do is look at the root cgroup.subtree_control and make sure the controllers we need are there. If they aren't there, we need to add them (in fact, this is exactly what redpwn/jail does). A minor issue is that in order to modify My patch is here, and works for my use case, but likely needs some work to be useful to others: master...ndrewh:cgroupsv2-fix (Footnote: If you're going to try my fork, nsjail needs to be the root process in the cgroup, you can accomplish this by invoking nsjail using the execve-variant of the |
@ndrewh I'm not running on docker in my case; this is "just" on a plain Ubuntu 22.04 system |
I looked at this a little more, since I know I've run into issues running on stock 22.04 as well. I tried this on a 22.04 desktop in virtualbox. I did see slightly different initial behavior in AWS, but I think what's here should still be helpful.
Example 1: Running as root?If you just straight up run
Fix:
(If you try the same thing on my fork, it'll do this last line for you. Whether this is desirable behavior in general or not, I am not sure.) Example 2: Creating a cgroup, running non-root (by adding user to
|
Thanks all for working on this and contributing. My cgroup1/2-foo is not great, but from what I can tell it works as expected. ./nsjail --config configs/bash-with-fake-geteuid.cfg --detect_cgroupv2 --cgroup_cpu_ms_per_sec 100 --cgroupv2_mount /sys/fs/cgroup/user.slice/user-1000.slice/user\@1000.service/
...
[JAILED-BASH:21:33:03:sh-5.2.2:/tmp]# openssl speed And I can see that only 10% of a single CPU core is used (via top and with Without cgroups [JAILED-BASH:21:35:48:sh-5.2.2:/tmp]# openssl speed
Doing md5 for 3s on 16 size blocks: 19115100 md5's in 3.00s With cgroups [JAILED-BASH:21:36:05:sh-5.2.2:/tmp]# openssl speed
Doing md5 for 3s on 16 size blocks: 2666542 md5's in 0.39s Not exactly 10%, but close enough, assuming the cores were not isolated for the test. |
Some progress at least, with the
It's not clear to my why I need to pass |
@mattgodbolt An unfortunate fix:
yup, argument order apparently matters (relative to the BTW - all of these options can be specified in the cfg file as well. Thank you for Compiler Explorer! ❤️ |
Yup, the current way of parsing args is to run file config parsing at the moment the I thought it was a clever way of doing things, but clever is not always the best :). However, changing it now would a). break backwards compatibility c). would be not-so easy to implement, b/c two passes of cmdline arguments would be needed (first file, then args) - or some way of caching them. |
thanks all:
is now more in line with the other stuff here I think?
right! I'm just trying to use the existing config as it's far easier to supply a couple extra cmdline flags on a v2 system than it is to have two config files, one for v1 and one for v2 (Ideally I can support both as we transition).
You're so welcome! --
makes sense to me! thanks! :) |
The point of As for the permissions error, I think you're closest to the "Example 2" in my previous comment. My guess is nsjail does not have permission to move the child out of the current cgroup. You can fix this by either (1) spawning nsjail inside a cgroup it has permissions to move children out of (e.g. via cgexec or Docker), or (2) modifying the permissions on the cgroup.procs file for nsjail's current cgroup (probably either the root one or the one associated with your terminal). |
Awesome! Thanks that clears up a few things. I'll try fiddling with settings on 22.04 to see if I can work out what environmental things need changing both for me as a user and then also in the VM for the site (which can be more bespoke) Cheers! |
@ndrewh I was able to get things working with that That works for my specific use case, but more genreally on a multi-tenant system is there any way thi can be made to work do you think? Is that an Ubuntu issue? |
@mattgodbolt I don't think it's a ubuntu issue, I think it's just that you need nsjail to be in a cgroup that it has permissions to move it's child processes out of. I think the following should work on a multi-tenant system: Make a new cgroup:
Run nsjail in that new cgroup
(note: if you |
I think I see. Thanks @ndrewh. Seems unfortunate to have to do the two steps (and specify the weird mount point thing too) but looks like it can be made to work. I'll have to see if that also works on cgroupv1 (I presume it does). |
I'll try to summarize (hopefully correctly) here in case someone finds this later:
These groups do not have to be the same. It sounds like for many applications you could just as well create two cgroups:
The user would need full ownership of Note I don't think this trick improves the situation in a default (but privileged) Docker container, where your best best is making sure that nsjail is the root process (and then nsjail will move itself to create a 2-group scenario similar to above). |
Hi! I'm having similar issues described here, getting the error message
although the file exists when checking with I am running inside Docker (26.1.2) with Output of
Output of
The full config can be found here. I tried to play around with it and couldn't figure it out. It seems to work under cgroupv1 on Debian Bookworm. Any help is greatly appreciated. |
@Gregofi I believe |
@Gregofi Sorry, I realized I gave a completely bogus answer... you used detect_cgroupv2 and it still didn't work.
The full log might be helpful here, but it's trying to move the child into the cgroup which it just created... not sure why this would fail, since it's quite literally doing one after the other: Lines 255 to 256 in a00a0ef
Some troubleshooting guesses:
Best of luck! |
Hi, thanks for your response. Yes, upon reading your comment I also suspect docker permissions. I tried various things. However using the explicit The first error, ending with |
There was a problem running on a cgroupv2 (arch) host. For more, see google/nsjail#196.
There was a problem running on a cgroupv2 (arch) host. For more, see google/nsjail#196.
For what it's worth we've been able to get this working in our systems now. But we've hit a new issue when updating to an even newer Ubuntu: #236 |
For example, when I run
nsjail
with--use_cgroupv2 --cgroupv2_mount /sys/fs/cgroup/NSJAIL
, I still see errors likeIf I udnerstand cgroups v2 correctly, it should look for
/sys/fs/cgroup/NSJAIL/memory.max
, not/sys/fs/cgroup/NSJAIL/NSJAIL.10/memory.max
./sys/fs/cgroup/NSJAIL
exists.The text was updated successfully, but these errors were encountered: