-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activate task becomes very slow when you have a lot of deployed items #120
Comments
@vitch I understand the issue and it seems from our discussions at the time that we had some concerns about this eventuality but decided the benefits outweighed the costs. The contract of the hooks is that they will fetch all the revisions so I am skeptical that general solutions that don't do this would be guaranteed to be backward compatible user's deployment configurations. As a low-tech solution, have you considered deleting or archiving your 59,000 oldest revisions? |
Yeah - I meant to mention that we'd considered that. The same folders are used to power our special historical URLs and I wanted to avoid adding special casing there... I agree that it would be hard to do this in a backward compatible way which is why I split it out from #119. I guess one other option would be to have a config flag to allow people to opt in to a new behaviour (where I'm essentially going to build that out anyway and test it in our environment to see if there are any unforeseen consequences. I can put up a PR for it and we can discuss there - if you don't think it's a good match for this plugin then I'm happy to maintain a forked version for our needs... If you have any concrete examples of |
I don't see any harm in having a config flag for this plugin that allows you to opt into the behavior you're describing. Then you wouldn't have the hassle of maintaining a fork. |
Cool. Any preferences on a name? (naming things is hard - those are more in the vein of brainstorming than actual suggestions) |
In fact, it looks like Looking at the hooks documentation and re-reading the threads above, I'm still not too clear on why (it looks like I can get another ~20% speed up by changing |
Another possible approach is to cache the result of That (in combination with #119) means we're loading the list of all deployed versions once instead of three times when |
There are cases where we might want to page through the list of objects available on S3 but we don't need to keep going once we've found what we are looking for. This commit updates `listAllObjects` to accept an `until` argument - if this is provided then when a new page of results is loaded we check if `some` of them match the `until` - if so we don't bother loading another page. A concrete use-case for this is in ember-cli-deploy#120 - sometimes we just want to find the currently active revision so we only need to loop through the "pages" on S3 until we do.
If you set this option in your S3 config then we will only load enough revisions from S3 to find the currently active version and then we will stop. The assumption here is that the main use-case for `fetchInitialRevisions` is to be able to do some sort of changelog or audit log from another plugin where we would want to know the details of the previously active revision. Since looping over revisions from S3 can be slow when you have too many of them stored (see ember-cli-deploy#120) we provide an option to short circuit that.
If you pass this flag then `fetchRevisions` doesn't attempt to fetch _any_ revisions from S3. This is useful because fetching the revisions from S3 can take a long time (see ember-cli-deploy#120) and if you don't have any reason to use them then this time is wasted. In our use case I have hooked this flag up to an environment variable. e.g. - in `deploy.js`: ``` disableFetchRevisions: process.env.DISABLE_FETCH_REVISIONS ``` Then we can set that environment variable when we are calling the `activate` command and leave it off at any other time (so e.g. `ember deploy:list` works as expected - if you have the time to wait)
If you set this option in your S3 config then we will only load enough revisions from S3 to find the currently active version and then we will stop. The assumption here is that the main use-case for `fetchInitialRevisions` is to be able to do some sort of changelog or audit log from another plugin where we would want to know the details of the previously active revision. Since looping over revisions from S3 can be slow when you have too many of them stored (see ember-cli-deploy#120) we provide an option to short circuit that.
If you pass this flag then `fetchRevisions` doesn't attempt to fetch _any_ revisions from S3. This is useful because fetching the revisions from S3 can take a long time (see ember-cli-deploy#120) and if you don't have any reason to use them then this time is wasted. In our use case I have hooked this flag up to an environment variable. e.g. - in `deploy.js`: ``` disableFetchRevisions: process.env.DISABLE_FETCH_REVISIONS ``` Then we can set that environment variable when we are calling the `activate` command and leave it off at any other time (so e.g. `ember deploy:list` works as expected - if you have the time to wait)
There are cases where we might want to page through the list of objects available on S3 but we don't need to keep going once we've found what we are looking for. This commit updates `listAllObjects` to accept an `until` argument - if this is provided then when a new page of results is loaded we check if `some` of them match the `until` - if so we don't bother loading another page. A concrete use-case for this is in ember-cli-deploy#120 - sometimes we just want to find the currently active revision so we only need to loop through the "pages" on S3 until we do.
We've been happily using
ember-cli-deploy
and this plugin for over 6 years - thanks!As a consequence of this, we've built up a large history of previously deployed versions of our apps (which are available via special URLs for testing and other purposes).
I noticed that our activation stage is taking a long time (we have a bunch of apps across a bunch of environments with a bunch of historical versions) and digging into it I found that a lot of this is due to this plugin needing to list the revisions available on S3 for various reasons.
In #119 I optimised some of the easy cases where it was possible to avoid some of this communication with S3 but we're still left with
fetchInitialRevisions
andfetchRevisions
.I was trying to figure out why these functions exist/ are called and whether we could optimise them in some way.
I found some relevant discussions:
fetchRevisions
-hook ember-cli-deploy#209fetchRevisions
hook to deploy and activate pipelines ember-cli-deploy#323 (comment)From these, I wasn't too clear exactly how important these hooks might be. If they impose a high cost on some users (those with full buckets on S3) then could we change their behaviour or somehow speed them up?
It seems like the prime use-case for
fetchInitialRevisions
is to generate a changelog and indeed we have code which adds an entry to an audit log with the previously active revision and the new one. If other plugins only need to know the previously active revision then we can optimise by at least bailing out of thelistObjectRecursively
loop once we've found the active revision.I'm interested in the appetite here for changes around this. If you think this is a problem that most people don't suffer from then we can fork the plugin and change the behaviour for our purposes. But if you think it's something that this plugin should handle then I'd love to talk about possible approaches and exactly what we need from
fetchInitialRevisions
andfetchRevisions
The text was updated successfully, but these errors were encountered: