Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache): separate python installation from base image by adding pre-built remote cache #1360

Open
Kaiyang-Chen opened this issue Dec 30, 2022 · 5 comments

Comments

@Kaiyang-Chen
Copy link
Contributor

Description

For current LLB compilation (show in below figure), we pull the base(custom) image in first layer, which means if the user change the base image (using different cuda version / different os, etc.), all caches from previous built will miss.
e
Under my network condition, the user group creation & python installation with conda took around 1 minute. I think such operation can be sped up by leveraging the pre-built remote cache for different python version from a fixed image. Demonstrate in the figure below, whenever the user is changing the base image, we can simply pull the llb.Diff(fixStage, pythonStage) caches and perform llb.Merge() between it and the base image.
d

Potential problems

  • Not sure whether buildkit support output the llb.Diff() layer, but if not, can walk around by caching pythonStage and do the llb.Diff() manually.
  • The above method modify etc/passwd & etc/usergroup when creating user group, when merging with base image, if conflict exist in such files between different os, there might be problems.

Other thoughts

If outputting llb.Diff() layer is possible, we might be able to pre-built caches for large package like pytorch, cuda-related components and use them as plug-in for base image. Since package downloading take significant time when building up docker environment, this should speed up the build process a lot.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@Kaiyang-Chen Kaiyang-Chen changed the title feat: separate python installation from base image by adding pre-built remote cache feat(cache): separate python installation from base image by adding pre-built remote cache Dec 30, 2022
@VoVAllen
Copy link
Member

VoVAllen commented Jan 3, 2023

Thanks for your contribution! I think the core problem here is at buildkit side, how we can inspect the llb.Diff node, and whether it's possible to export it separately. Can you raise the question at the buildkit repo and link it here also? Thanks!

@kemingy
Copy link
Member

kemingy commented Jan 3, 2023

  • LLB Merge could be problematic when there are some overlapped directories.
  • Maintaining remote cache for different Python versions need also consider the security update.

You need to check the v1 graph. It should support Python w/wo Conda/Mamba.

@gaocegege
Copy link
Member

Thanks for the proposal!

We can optimize the workflow further. For example, we can investigate if we could merge the pytorch/tensorflow package into the environment image directly, instead of downloading and installing it from pypi.

The tf/torch packages are too large. it may be faster to keep a remote cache for them.

@VoVAllen
Copy link
Member

VoVAllen commented Jan 3, 2023

And also starship package, it used github domain to host packages, which is hard to install when network issue exists when we don't have cache

@gaocegege
Copy link
Member

Yep. starship. It is hard to install here in CN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants