Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve time and space efficiency of backend Docker container build #3006

Open
MoralCode opened this issue Feb 19, 2025 · 6 comments
Open

Improve time and space efficiency of backend Docker container build #3006

MoralCode opened this issue Feb 19, 2025 · 6 comments

Comments

@MoralCode
Copy link

MoralCode commented Feb 19, 2025

Is your feature request related to a problem? If so, please describe the problem:
disk space limitations make it hard to build the backend docker image (seems to be at LEAST 6GB before it failed on my fedora VM, possibly 12 if podman needs 2x the space to write the final image)

Potential solutions:
After looking through the dockerfiles I noticed a few things that could be improved

  1. Most of the space taken up by the image is due to rust and golang dependencies that likely arent needed at runtime since both of those languages ship fat binaries with everything in one place (as far as i know, i only have passing experience with both of those languages)

Image

  1. Releatedly, the source code for these builds that is cloned as part of the build is also kept around. removing this could save an additional 334M
  2. The build, specifically install-workers-deps.sh includes the NLTK popular metapackage, which includes lots of other packages. are all of these sub packages being used? (theres between 8k and 139M savings depending on what can be excluded, i have a list of them by size in my notes)

My remaining questions so far:

  • Is there a reason that these dependencies are being built from source?
    • If not, Would it be better to contribute these build steps upstream to places like openssf so augur can rely on fetching the binaries from a popular registry (like crates.io or whatever golang uses)?

Additional context:
Slack thread (in CHAOSS Slack) about this: https://chaoss-workspace.slack.com/archives/C0226ELG6R4/p1739815260328199

This PR seems to at least have been attempting a partial solution for the first two parts of this by creating a separate build step: #2947

The large image size may also cause an issue for some users who are using older versions of podman which may have trouble making large commits containing large image layers

@MoralCode
Copy link
Author

Also likely related: #2982

@cdolfi
Copy link

cdolfi commented Feb 19, 2025

@GregSutcliffe will have some thoughts on this. I know he has mentioned some things around go and that there might be a significantly more efficient way to get the git info without needed a full clone

@MoralCode
Copy link
Author

Im refactoring the docker container into a more builder pattern style (with separate containers for golang and rust)

Are there any dependencies relying on rust right now? because I dont see any

@Ulincsys
Copy link
Contributor

@MoralCode
It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c42

@MoralCode
Copy link
Author

@MoralCode It looks like we are using Rust in order to use cargo to install geckodriver. Sean added that install to the Dockerfile in 5a14c42

It seems like that has since been removed/replaced with a tarball method of installing firefox and geckodriver 9eb4b61

@cdolfi
Copy link

cdolfi commented Feb 24, 2025

@sgoggins do you know of any rust dependencies?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants