-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
workflows/merge-group: init #431146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflows/merge-group: init #431146
Conversation
fdd597c to
6a2009f
Compare
emilazy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 💜
I hope we can get eval in here soon!
.github/workflows/pr.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don’t want the actual list of required jobs to be the same, though. GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.
So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This comment was about the name, not the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don’t want the actual list of required jobs to be the same, though.
This comment was only about the name. This needs to be the same, even if the checks contained in the PR / Merge Group workflows are not the same, or even entirely different. The comment has meanwhile been removed anyway.
GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.
Once we add Eval, this is unavoidable. We need to run Eval in the PR, to be able to label based on the results. And to run nixpkgs-review, etc.
And we surely want to run Eval in the merge queue as well.
Since Eval takes the most time, we can also run all the other checks as well. This is just a matter of resources, but not of cycle time.
So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.
I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all.
I'm skeptical of this assertion. We should still run (most) CI on unfinished/unqueued PRs, as this is a much better feedback cycle than enqueing a PR and failing a group of PRs just to see what CI would do.
Even if you meant "run CI, but don't require it," I'm also skeptical of that: if we're running CI anyway, and it fails, why would we want to enqueue that PR?
At that point, the failing PR needs some attention; maybe just a rebase, maybe some actual changes, or maybe CI is broken. Either way, it shouldn't be enqueued in that state.
I could envisage postponing slow or expensive jobs until the final check in the queue, though. Assuming they are unlikely to need attention during a normal PR's lifecycle. There's a balance to be struck here between saving time+resources and only noticing CI issues after a PR is queued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.
I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.
I don't think GitHub's rulesets even properly support configuring required status checks separately for the PR and the queue. Even if we wanted to (which I don't think we do).
We can configure how each version runs, based on event (pull_request vs merge_group), but we cannot have it required only for the queue. The closest we could get to that would be to unconditionally skip the "no PR failures" job on the pull_request event, such that it is always considered successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to be very careful not to do the same the other way around, aka canceling the merge group workflows when new events happen in the PR. Or alternatively... this could be a feature, leading to automatic cancellation of the merge queue for this PR.
FWIW, you have to do it explicitly in code, the concurrency stuff doesn’t support it. So we could implement whatever policy we wanted here, really.
That's not going to work well, I think. The
merge_groupevent context doesn't have information about the Pull Request in question.Also the workflow runs on the commit sha of the merged commit, thus we can't find the artifacts for this workflow with the labeler job. Thus anything that Eval produces here is useless for labeling purposes. Labeling will have to happen in the PR.
Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable. Whether it’s worth the effort, I don’t know.
I see your point for conflict-prone automated treewides. I wonder how they interact with merge queues and the "jump the queue" feature, though. I assume you should be able to:
- cancel the existing queue
- rebase on latest master and run your automation
- run CI in you PR
- jump the queue
And all of that with a rather low probability of hitting another merge conflict. But maybe that doesn't work well.
My experience was that waiting for eval to see rebuild numbers was somewhat likely to lead to more conflicts in the truly big treewides even before we had required checks or the merge queue. OTOH I think eval got faster since then.
But mostly I think it’s strictly redundant; if we’re checking things in a queue, everything redundant to that outside of the queue ultimately has strictly advisory lint status. We can’t stop people putting up broken PRs; what we care about is preventing them from landing. Merge queues inherently result in some decrease of merge velocity, and I think we can get some of it back by not blocking entry into the queue and cancelling redundant jobs.
Hm, not entirely true. We do have some kinds of checks that are semi-required by design. Currently that's the "check cherry-picks" job. We require this job to run and succeed (for the error cases) and we require the committer to dismiss the automated review (for the warning cases). We surely want to run this check to completion before merge.
It could still just boot the PR out of the queue if it’s not happy, right? But I agree that it’s benign to require checks that should only take a few seconds anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could explore ways to make things like eval only required for merge_group, while still having faster things like linting and cherry-pick validation required in pr.yml.
I'm still not convinced either way, but the example of conflict-prone PRs is valid.
IMO fine-tuning which checks run and/or are required by each workflow is something that should be done after we have the initial Merge Queue setup.
Wild idea: would it be reasonable to configure (some) of the required checks using labels? E.g. you could have a "conflict prone" label that tells pr.yml to skip some of its slower checks, or at least not require them in no PR failures.
That way it's an explicit opt-in to skip certain checks until the merge_group run, while other PRs could still use an "auto merge" that only enters the queue once CI looks green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just think that this is plain a bug of GitHub merge queues. Like, I do not think that if you were designing a code review and merge queue system from scratch you would come up with the idea that even assessing a PR for merge should gate on the same checks outside the queue. The queue is the gate. Being able to optimistically enter into the queue and check changes efficiently while still enforcing monotonic properties of the merged branch is the advantage – otherwise you could achieve the same goal with required PR checks and enforcing al PRs to be up to date with HEAD. GitHub required checks are just designed wrong. If you look at the thread where they announced it there’s a bunch of people pointing out this bizarre misfeature, FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(To be clear, none of this is a blocker. But getting monotonic properties on our main branches while still having maximum concurrency of checks and merge velocity is why I’m so eager to see us move stuff into queues.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of stuff to think about, won't comment on most of it, but will keep thinking about this.
It could still just boot the PR out of the queue if it’s not happy, right?
If we can't detect the boundaries of where a PR starts in the list of commits between base...head of the merge queue, then we can't say which PR to boot out of the queue. We might kick out other PRs that we didn't intend to.
(This is all about checks which don't check the current head commit, but each commit on its own, as it is required for linting commit messages and such)
Of course, if we can find out which commits belong to each PR, this would be possible but from experience so far, the following is certainly not easy:
Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable.
Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.
6a2009f to
04c039f
Compare
MattSturgeon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
|
Adding the merge queue essentially means that we would take away the ability from committers to bypass "required status checks" in PRs, see discussion in NixOS/org#149 (comment). This shouldn't stop this PR from being merged, though. It's rather simple and the remaining discussion is only about how to actually the implement the branch ruleset (for now). |
|
Does introducing this require a corresponding change to the nixpkgs-merge-bot? |
That's a very important point, thanks for raising it. I looked into it:
My conclusion: The nixpkgs-merge-bot would very likely try to bypass the merge queue every time. Since the bot doesn't seem to be part of the nixpkgs-committers team, it would fail to do so. The merge-bot would be broken. I think the merge-bot needs to replicate the logic of the CLI tool: Check whether the PR needs a merge queue and if yes enable auto-merge via graphql instead of merging directly. |
Happened here as well: #391329. |
|
MattSturgeon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this workflow on master, before enabling the actual ruleset requirement in the repo settings. Even if we change our minds about using merge queues, this workflow can easily be removed later. Therefore I don't think this is blocked by agreeing to require merge queues.
The needs list can evolve over time, so starting initially with just lint seems reasonable. Same for the PR needs list; no changes for now seems reasonable.
Most discussion on this PR has been focused on how we will use merge queues going forward, so it shouldn't block merging this initial workflow now.
LGTM.
|
I added the open TODO for nixpkgs-merge-bot to NixOS/org#149 to prevent enabling merge queues without support for that. |
|
Successfully created backport PR for |
Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.
cc @emilazy who asked for it in #428350 (comment) and #430675 (comment).
Things done
Add a 👍 reaction to pull requests you find important.