-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues when running in a Docker container on CircleCI #41
Comments
Actually, I now believe it may be related to resolving files that need to be updated. Is there a way to optimize this? |
Not likely. Note that the algorithm used is pretty simple. We fetch metadata about the objects to do MD5 hash compares, which is obviously faster than uploading everything on every run, but we stream the files, so I don't see any obvious pitfalls, but then I'm just guessing. |
I see. So likely the long time that it takes is a combination of fetching metadata plus calculating MD5s for so many files (many of which are larger images). I've seen similar packages use ETags and store them in a locally cached file as json that could be checked in to source control. This might be a more performant diffing strategy? |
Sure, but it would be harder to get right (it would, for one, break if you build from different PCs/CI server). Also, you are the first person to report problems in this department. It works fine for me, and I'm the primary user (I wrote this for myself), it would be a hard sell to complicate this piece of software to solve problems that ... I don't have. |
Also note that these ETags are stored in S3 as metadata, so I suspect it is the local MD5 hash calculation that I suspect would take time in this scenario, and that would not change with a local cache. |
I'm getting the same issue:
So, |
@rvangundy I ended up using If you have a usecase of a large amount of files that dont change often, it seems like the best option 👍 |
@petems how do you know, using potentially old and stale cache files, that it hasn't changed? I ask because I would love to improve on this if possible -- I guess it could be possible if S3 could report a "last modifed" timestamp for the entire bucket. |
For my usecase I'm the only pusher/owner on the repo, so I can generate a new cachefile, commit it to the repo and that covers 90% of the deploy. I only really have to do it once to cache the last 3 or so year of blogposts and images, then after that I'm not that fussed about keeping it updated as it's already sped things up. If there's something I'm missing and a way to do the same with NB: I should also probably try updating the docker image as it's using a version of s3deploy from 9 months ago, so need to check if it's still the case with a more recent build... |
Looking at this now, I think there was a similar issue on Hugo's Other than that, you may want to test reducing the number of workers e.g. |
I'm running s3deploy on CircleCI using version 2.3.0. The performance is very poor, though I've increased the size of my container to 8 vCPUs and 16gb ram. I've also increased the number of s3deploy workers to 8. Often, the process runs out of memory and is shut down by Circle. If it succeeds, it takes a very long time, nearly 10 minutes or more. My project is quite large, and the particular set of uploads that break involve lots of image files in several subfolders.
I've tried a variety of optimizations, in particular modifying the config file to optimize the regex or reducing the search paths (my hunch being that there's some deep recursion happening and requiring heavy memory use).
I'm able to run the same routine on OS X which has similar specs as the Docker container I'm running on CircleCI. It runs very quickly in my local environment.
Also note that there are no new files that need uploading, so this slowness is not due to upload speeds. Are there any performance tips anyone can offer?
The text was updated successfully, but these errors were encountered: