Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Implement caching of .build for unit-tests #2874

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MahdiBM
Copy link
Contributor

@MahdiBM MahdiBM commented Sep 10, 2024

Motivation:

Faster tests. Less CI time / costs.

Not sure if this is what you would want at all, but if you do, I think this (possibly with minor fixes) should work properly.

Modifications:

Add an option to swift_matrix to do caching of .build.

For now, enable this caching in unit-tests.

Result:

Faster tests. Less CI time / costs.

@@ -108,6 +112,16 @@ jobs:
if: ${{ matrix.swift.enabled }}
# https://github.com/actions/checkout/issues/766
run: git config --global --add safe.directory ${GITHUB_WORKSPACE}

- name: Restore .build
if: ${{ matrix.swift.enabled && (inputs.cache_key_prefix != '') && !(github.run_attempt > 1) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!(github.run_attempt > 1) will be useful when something is inconsistent in CI and you're retrying the job.
Although you can delete cache from the GitHub Actions UI too, on web.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 10, 2024

@FranzBusch WDYT?

@FranzBusch
Copy link
Member

We have intentionally not adopted caching yet since it trades potential time saving for cost. I am going to kick-off the CI here since I want to see what the timing impact actually is and see if the trade off in time vs cost is worth it.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

@FranzBusch are you referring to GitHub costs?
Since our org at work used to use GitHub Actions Runners and caching, and we did have to pay for our usage, my off-hand guess would be it'll actually decrease costs. Caching .build has a noticeable effect on run times, and as far as I can see caching is free for up to 10GB in GitHub (in my experience it actually tolerates and allows for more usage), and exceeding that limit will just delete the oldest cache.

Furthermore, you can use an S3 bucket for caching with runs-on/cache.
Like i've mentioned before, I'd even highly recommend moving the runners to RunsOn-managed runners. It'll be a MASSIVE (5x-10x+) cost saver as well as a pretty nice time-cut from the CIs. It'll run everything on your own AWS account, and is pretty easy to set up.
To be clear I have no affiliation with RunsOn 😅.
The only reason I can rationalize you-all not wanting to use RunsOn, is organizational approvals that you might need to have, and perhaps security checks that RunsOn might need to go through. I think RunsOn has a 1500$ yearly plan which gives you access to their full code, so that might be helpful.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

@FranzBusch based on the code i put there, it won't use the cache in a re-run. Not that, that code is mandatory, but i'll just push something to the branch to see how things will go with the current changes.

@FranzBusch
Copy link
Member

@MahdiBM Can we make re-runs work. Then I can trigger the CI again to see the impact

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

@FranzBusch there you go

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

So it appears that for swift-nio, it decreases the nightly unit-tests runs from 4m40s to 4m20s. A pretty low difference.

Though this difference will get noticeably bigger, the heavier the package / builds are.

I'm not sure if there are such heavy repos in the swiftlang/Apple orgs.

If you're considering maintaining standardized swift reusable workflows for public users, caching will be necessary IMO in such a workflow.

I'll let you decide if you think it's worth it to have the option for caching in the current workflows, or not.

@FranzBusch
Copy link
Member

I keep seeing this in the logs

Failed to save: Unable to reserve cache with key unit-tests-swift-nio-build-nightly-6.0-Linux-307731ca0421d5a69a068b62a5d04c4356279abcf0cfd0d71aa719a762bef5a7, another job may be creating this cache. More details: Cache already exists. Scope: refs/pull/2874/merge, Key: unit-tests-swift-nio-build-nightly-6.0-Linux-307731ca0421d5a69a068b62a5d04c4356279abcf0cfd0d71aa719a762bef5a7, Version: 71ee539a69b4092a757e60e07a31650a220dbc09d2d02876da35435f74632221
Warning: Cache save failed.

Any idea @MahdiBM?

@FranzBusch
Copy link
Member

So it appears that for swift-nio, it decreases the nightly unit-tests runs from 4m40s to 4m20s. A pretty low difference.

That's kinda what I expected tbh. The time it takes to push and download the cache vs just running a build is too minimal.

If you're considering maintaining standardized swift reusable workflows for public users, caching will be necessary IMO in such a workflow.

These workflows for now are really only intended to be used by library packages in the swiftlang, apple and swift-server organization. So at this time I think we can hold off adding the caching. Though I appreciate you taking the time to create this PR to show the impact of it!

uses: actions/cache/restore@v4
with:
path: .build
key: "${{ inputs.cache_key_prefix }}${{ github.event.repository.name }}-build-${{ matrix.swift.swift_version }}-${{ runner.os }}-${{ (github.event.pull_request.base.sha != null) && github.event.pull_request.base.sha || github.event.after }}"
Copy link
Contributor Author

@MahdiBM MahdiBM Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ (github.event.pull_request.base.sha != null) && github.event.pull_request.base.sha || github.event.after }}
is the main thing that can be different for a specific swift version.
It allows the cache to be updated after each push to the main branch, since it'll resolve to github.event.after.

It either resolves to the base branch's SHA in PR events, or to the new SHA of the branch in push events.
So each PR will be looking for their base branch's cached .build.
And on a push to the main branch, the cache is refreshed.
It will be suboptimal, but even the event isn't a push or PR, the CI should still use the cache. But maybe also waste some seconds on trying to re-save the cache.

One confusing behavior of the caches action is that in a PR, you cannot create a cache that the next commits in a PR can use. That's why the cache "key" doesn't try to do something like that.
You can only access the repo's primary-branch's caches. [Edit: actually not 100% sure about this, this "rule" might be even more weird, but still, what is in this PR makes sense]

FYI another gotcha of the caches action is that you cannot delete OR override value of a cache "key" (not programmatically, otherwise there is a delete button in GitHub web UI).

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

@FranzBusch Without the cache-save step which shouldn't be usually triggered, there will be a 30s more time cut, at which point it might be worth it (50s cut in this repo for nightly debug builds for unit tests).
If there is something that uses release builds, caching can have a bigger impact on those as well.
I generally think 50s-60s of a time cut is "good" considering the whole CI doesn't take 5 mins usually, but you should still decide if it's worth it, or if you want to just have this anyway and test how it works out.

I figured out why the cache-save is hit at all, when it shouldn't be.
Before the restore step, we haven't yet checked-out the package.
So ${{ hashFiles('./Package.resolved') }} resolves to just an empty string.
Then the restore-cache step will return cache-hit=false.
And that's why the cache-save step is triggered when it shouldn't (based on the steps.restore-cache.outputs.cache-hit != 'true' condition on the cache-save step.)

Now, we can check out the package and everything will be fine, but the problem is this repository does not commit the Package.resolved, which means hashFiles('./Package.resolved') will still resolve to an empty string, and we won't have a proper way to know when we should refresh the .build cache. Instead, we can use the base branch's commit sha like I mentioned in a comment in the code.

@MahdiBM
Copy link
Contributor Author

MahdiBM commented Sep 11, 2024

@FranzBusch TLDR; The cache-save step should no longer be triggered now, if the cache save is going to fail anyways.
You can give it another try if you want.

@MahdiBM MahdiBM marked this pull request as ready for review September 11, 2024 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants