-I hit several errors about not finding GPUs, or (in the case of rootful Docker with usernetes) the networking never working. I had a mangly device error that was resolved when I updated the container to the latest version (now 2 years old). Another unexpected issue was with respect to data. I had prepared data to use from the old container, and when it was attempted to be used with the newer version, it wouldn't validate and would try to download. Given that the download links weren't working at the time, I couldn't run anything. I had to ensure that the data matched the container. There is more on that [here](https://github.com/converged-computing/flux-usernetes/tree/main/google/gpu/docker). We ultimately build our own container with the data to ensure it is available, and we won't take time during our experiments to download it. Side note - in that exercise I learned that I could convert a Python egg to a zip, unzip t explore, and then make changes and repackage into an egg! YOLO!
0 commit comments