You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've never used / installed jupyter notebooks before
I ran into this on a 4-hour troubleshooting call while helping another engineer debug their environment.
The details I'll post are my recollection of a troubleshooting call that ultimately resulted in fixing the issue. The notes will be incomplete as I'm going off memory of bits and pieces seen over screen share, but I figured It'd be worth documenting the relevant notes I can call while they're still fresh in my head, in case this helps anyone else in the future.
Since this wasn't my environment / a tool I'm not familiar with:
I don't know how to reproduce it
and I'm limited in the amount of details I can share.
Adding the above fixed efs, but resulted in IAM breaking.
By IAM breaking I mean:
Running aws sts get-caller-identity in Jupyter web interface's terminal would fail with a filesystem permission error.: [Errno 13] Permission denied: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'
Important detail: The docker image had logic in startup.sh & single user.sh to do a live change of the user from root to jovyan upon startup, the temp root access was likely used to change the ownership of home (efs) to the jovyan user.
The reason it was broke seemed to be that the script changed the active shell user & ownership of most files on the container's file system to jovyan, BUT there was a key file related to IAM that was still owned by root as a result of the container starting off as root user. /var/run/secrets/eks.amazonaws.com/serviceaccount/token
was owned by root user and group, by root:root,
(per ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
so the jovyan user didn't have access.
(we played around a bit and found that even if you override the kube yaml defaults which list that as read only, it stays read only due to the nature, so it's permissions can't be updated at run time / only established at container creation time.)
We did discover a hacky workaround that allowed both (efs and iam) to work at the same time using the specs recommended in the doc (of uid:0, fsGid:0)
The workaround involved:
updating 2 settings to enable sudo to work in the container https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-static
was used as a point of reference
singleuser.allowPrivilegeEscalation was set to true
and we had to enable some setting in another spot that I can't recall off the top of my head.
That allowed the following commands to work
(before aws sts get-caller-identity was throwing a file system permission error, when whoami returned jovyan)
The above allowed both efs & IAM to work. It was a hacky manual workaround, but it at least proved it was possible for both to work at the same time.
Removing those newly added configuration's (uid: 0, fsGid: 0) that the docs (https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html) suggested should be added to make efs work, brought efs back into a broken state, but fixed IAM. (basically rolled back to the previous config.)
When IAM was working ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
showed jovyan had access to it.
(I think the file system permissions were set to user_id:group_id, 1000:100, which would correspond to jovyan:users)
Through Trial and error, we discovered a solution that allowed both (EKS IAM and EFS storage mount) to work at the same time.
We went against what the docs recommended and set it to (uid:0, & blank fsGid, which I think has an explicit default of fsGid:100 / represents a file system group named users.)
I think we also left enable root or singleuser.allowPrivilegeEsclation enabled as well from our testing, but I don't recall if that was actually needed or not.
That allowed both (AWS IAM calls and EFS file system access) to work. I'll try to recall some observations of the setup.
Even though uid:0 was set, there were some startup scripts built into the container that made it so the user you got when you requested an interactive web terminal via the web GUI interface would be jovyan when checked with the whoami command.
In that setup of the ideal config ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
showed 0:100 (owned by "root"(0), and group named "users"(100) had access), which allowed aws cli commands from user jovyan to continue working.
The text was updated successfully, but these errors were encountered:
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋
Thanks for the notes. I think the EFS doc is aimed at a relative newcomer to AWS. If you've got more experience of AWS and you've got some time it might be worth checking the latest offerings from AWS. For example, it looks like there's an EKS CSI driver for EFS: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html
If you have a chance to look at this please let us know if it's a better replacement for the current instructions!
Background Context:
Bug description
Following docs seems to suggest problematic configuration:
https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html
The gist of the config problem is:
Here's a screen-snip of what the docs looked like at the time of the issue:
Proposed change of better suggested configuration:
I think the recommended config should look more like
uid: 0
fsGid: 100 (or blank/omitted entirely)
How to reproduce
Your personal set up
aws sts get-caller-identity
were run from the web terminal.So the web terminal was leveraging both:
Expected behavior
(like
aws sts get-caller-identity
)Actual behavior
Notes:
uid: 0
fsGid: 0
CHOWN_HOME: "yes"
Which was mentioned in their docs
https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html
Running
aws sts get-caller-identity
in Jupyter web interface's terminal would fail with a filesystem permission error.:[Errno 13] Permission denied: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'
/var/run/secrets/eks.amazonaws.com/serviceaccount/token
was owned by root user and group, by root:root,
(per
ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
so the jovyan user didn't have access.
(we played around a bit and found that even if you override the kube yaml defaults which list that as read only, it stays read only due to the nature, so it's permissions can't be updated at run time / only established at container creation time.)
The workaround involved:
https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-static
was used as a point of reference
singleuser.allowPrivilegeEscalation was set to true
and we had to enable some setting in another spot that I can't recall off the top of my head.
That allowed the following commands to work
sudo cp /var/run/secrets/eks.amazonaws.com/serviceaccount/token /home/jovyan/token
sudo chown $USER:$USER /home/jovyan/token
export AWS_WEB_IDENTITY_TOKEN_FILE=/home/jovyan/token
aws sts get-caller-identity
aws sts get-caller-identity
was throwing a file system permission error, whenwhoami
returned jovyan)ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
showed jovyan had access to it.
(I think the file system permissions were set to user_id:group_id, 1000:100, which would correspond to jovyan:users)
whoami
command.ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
showed 0:100 (owned by "root"(0), and group named "users"(100) had access), which allowed aws cli commands from user jovyan to continue working.
The text was updated successfully, but these errors were encountered: