workflows/merge-group: init #431146

wolfgangwalther · 2025-08-05T11:58:38Z

Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.

cc @emilazy who asked for it in #428350 (comment) and #430675 (comment).

Things done

Support merge queues in nixpkgs-merge-bot: Support merging PRs with merge queue enabled nixpkgs-merge-bot#202
Add nixpkgs branch protection ruleset to enable the queue: rulesets/nixpkgs: add merge queue for lints on master org#149
Tested in test infinisil-test-org/nixpkgs#43
Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

.github/workflows/merge-group.yml

emilazy

Thank you! 💜

I hope we can get eval in here soon!

emilazy · 2025-08-05T13:24:37Z

.github/workflows/pr.yml

We probably don’t want the actual list of required jobs to be the same, though. GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

Correct. This comment was about the name, not the list.

We probably don’t want the actual list of required jobs to be the same, though.

This comment was only about the name. This needs to be the same, even if the checks contained in the PR / Merge Group workflows are not the same, or even entirely different. The comment has meanwhile been removed anyway.

GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.

Once we add Eval, this is unavoidable. We need to run Eval in the PR, to be able to label based on the results. And to run nixpkgs-review, etc.

And we surely want to run Eval in the merge queue as well.

Since Eval takes the most time, we can also run all the other checks as well. This is just a matter of resources, but not of cycle time.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all.

I'm skeptical of this assertion. We should still run (most) CI on unfinished/unqueued PRs, as this is a much better feedback cycle than enqueing a PR and failing a group of PRs just to see what CI would do.

Even if you meant "run CI, but don't require it," I'm also skeptical of that: if we're running CI anyway, and it fails, why would we want to enqueue that PR?

At that point, the failing PR needs some attention; maybe just a rebase, maybe some actual changes, or maybe CI is broken. Either way, it shouldn't be enqueued in that state.

I could envisage postponing slow or expensive jobs until the final check in the queue, though. Assuming they are unlikely to need attention during a normal PR's lifecycle. There's a balance to be struck here between saving time+resources and only noticing CI issues after a PR is queued.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.

I don't think GitHub's rulesets even properly support configuring required status checks separately for the PR and the queue. Even if we wanted to (which I don't think we do).

We can configure how each version runs, based on event (pull_request vs merge_group), but we cannot have it required only for the queue. The closest we could get to that would be to unconditionally skip the "no PR failures" job on the pull_request event, such that it is always considered successful.

We'd need to be very careful not to do the same the other way around, aka canceling the merge group workflows when new events happen in the PR. Or alternatively... this could be a feature, leading to automatic cancellation of the merge queue for this PR.

FWIW, you have to do it explicitly in code, the concurrency stuff doesn’t support it. So we could implement whatever policy we wanted here, really.

That's not going to work well, I think. The merge_group event context doesn't have information about the Pull Request in question.

Also the workflow runs on the commit sha of the merged commit, thus we can't find the artifacts for this workflow with the labeler job. Thus anything that Eval produces here is useless for labeling purposes. Labeling will have to happen in the PR.

Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable. Whether it’s worth the effort, I don’t know.

I see your point for conflict-prone automated treewides. I wonder how they interact with merge queues and the "jump the queue" feature, though. I assume you should be able to:

cancel the existing queue

rebase on latest master and run your automation

run CI in you PR

jump the queue

And all of that with a rather low probability of hitting another merge conflict. But maybe that doesn't work well.

My experience was that waiting for eval to see rebuild numbers was somewhat likely to lead to more conflicts in the truly big treewides even before we had required checks or the merge queue. OTOH I think eval got faster since then.

But mostly I think it’s strictly redundant; if we’re checking things in a queue, everything redundant to that outside of the queue ultimately has strictly advisory lint status. We can’t stop people putting up broken PRs; what we care about is preventing them from landing. Merge queues inherently result in some decrease of merge velocity, and I think we can get some of it back by not blocking entry into the queue and cancelling redundant jobs.

Hm, not entirely true. We do have some kinds of checks that are semi-required by design. Currently that's the "check cherry-picks" job. We require this job to run and succeed (for the error cases) and we require the committer to dismiss the automated review (for the warning cases). We surely want to run this check to completion before merge.

It could still just boot the PR out of the queue if it’s not happy, right? But I agree that it’s benign to require checks that should only take a few seconds anyway.

I guess we could explore ways to make things like eval only required for merge_group, while still having faster things like linting and cherry-pick validation required in pr.yml.

I'm still not convinced either way, but the example of conflict-prone PRs is valid.

IMO fine-tuning which checks run and/or are required by each workflow is something that should be done after we have the initial Merge Queue setup.

Wild idea: would it be reasonable to configure (some) of the required checks using labels? E.g. you could have a "conflict prone" label that tells pr.yml to skip some of its slower checks, or at least not require them in no PR failures.

That way it's an explicit opt-in to skip certain checks until the merge_group run, while other PRs could still use an "auto merge" that only enters the queue once CI looks green.

I just think that this is plain a bug of GitHub merge queues. Like, I do not think that if you were designing a code review and merge queue system from scratch you would come up with the idea that even assessing a PR for merge should gate on the same checks outside the queue. The queue is the gate. Being able to optimistically enter into the queue and check changes efficiently while still enforcing monotonic properties of the merged branch is the advantage – otherwise you could achieve the same goal with required PR checks and enforcing al PRs to be up to date with HEAD. GitHub required checks are just designed wrong. If you look at the thread where they announced it there’s a bunch of people pointing out this bizarre misfeature, FWIW.

(To be clear, none of this is a blocker. But getting monotonic properties on our main branches while still having maximum concurrency of checks and merge velocity is why I’m so eager to see us move stuff into queues.)

Lots of stuff to think about, won't comment on most of it, but will keep thinking about this.

It could still just boot the PR out of the queue if it’s not happy, right?

If we can't detect the boundaries of where a PR starts in the list of commits between base...head of the merge queue, then we can't say which PR to boot out of the queue. We might kick out other PRs that we didn't intend to.

(This is all about checks which don't check the current head commit, but each commit on its own, as it is required for linting commit messages and such)

Of course, if we can find out which commits belong to each PR, this would be possible but from experience so far, the following is certainly not easy:

Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable.

Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.

MattSturgeon

LGTM, thanks

wolfgangwalther · 2025-08-05T14:01:46Z

Adding the merge queue essentially means that we would take away the ability from committers to bypass "required status checks" in PRs, see discussion in NixOS/org#149 (comment).

This shouldn't stop this PR from being merged, though. It's rather simple and the remaining discussion is only about how to actually the implement the branch ruleset (for now).

jfly · 2025-08-07T07:00:55Z

Does introducing this require a corresponding change to the nixpkgs-merge-bot?

wolfgangwalther · 2025-08-07T09:50:05Z

Does introducing this require a corresponding change to the nixpkgs-merge-bot?

That's a very important point, thanks for raising it. I looked into it:

nixpkgs-merge-bot uses the GH REST API to merge: https://github.com/NixOS/nixpkgs-merge-bot/blob/84230ab0e27e1e69b1e395a784cdf5f2cd1066da/nixpkgs_merge_bot/github/github_client.py#L233-L251
The gh pr merge CLI command is described to either merge or add to the queue, possibly also enabling auto-merge.
The CLI tool either merges or enables auto-merge via GraphQL API, not REST API: https://github.com/cli/cli/blob/cf1e14c33974711c81f7a1f71c92a0fd339adf43/pkg/cmd/pr/merge/http.go#L88-L96
"auto-merge" is said to also add to the merge queue: https://github.com/cli/cli/blob/cf1e14c33974711c81f7a1f71c92a0fd339adf43/pkg/cmd/pr/merge/merge.go#L303-L304

My conclusion: The nixpkgs-merge-bot would very likely try to bypass the merge queue every time. Since the bot doesn't seem to be part of the nixpkgs-committers team, it would fail to do so. The merge-bot would be broken.

I think the merge-bot needs to replicate the logic of the CLI tool: Check whether the PR needs a merge queue and if yes enable auto-merge via graphql instead of merging directly.

wolfgangwalther · 2025-08-11T07:18:13Z

[happened] in #428350 (comment) and #430675 (comment).

Happened here as well: #391329.

zowoq · 2025-08-12T06:51:22Z

[happened] in #428350 (comment) and #430675 (comment).

Happened here as well: #391329.

And in #426382, fixed in #432998.

MattSturgeon

We need this workflow on master, before enabling the actual ruleset requirement in the repo settings. Even if we change our minds about using merge queues, this workflow can easily be removed later. Therefore I don't think this is blocked by agreeing to require merge queues.

The needs list can evolve over time, so starting initially with just lint seems reasonable. Same for the PR needs list; no changes for now seems reasonable.

Most discussion on this PR has been focused on how we will use merge queues going forward, so it shouldn't block merging this initial workflow now.

LGTM.

wolfgangwalther · 2025-08-14T07:27:53Z

I added the open TODO for nixpkgs-merge-bot to NixOS/org#149 to prevent enabling merge queues without support for that.

nixpkgs-ci · 2025-08-14T07:28:47Z

Successfully created backport PR for release-25.05:

[Backport release-25.05] workflows/merge-group: init #433584

wolfgangwalther force-pushed the ci-merge-queue branch from fdd597c to 6a2009f Compare August 5, 2025 11:59

wolfgangwalther mentioned this pull request Aug 5, 2025

git-spice: format #428350

Merged

13 tasks

nix-owners bot requested review from MattSturgeon, Mic92, philiptaron and zowoq August 5, 2025 12:01

MattSturgeon reviewed Aug 5, 2025

View reviewed changes

.github/workflows/merge-group.yml Outdated Show resolved Hide resolved

MattSturgeon reviewed Aug 5, 2025

View reviewed changes

.github/workflows/merge-group.yml Outdated Show resolved Hide resolved

This was referenced Aug 5, 2025

rulesets/nixpkgs: add merge queue for lints on master NixOS/org#149

Merged

nixpkgs: disable rebase and squash merges NixOS/org#135

Closed

emilazy reviewed Aug 5, 2025

View reviewed changes

workflows/merge-group: init

04c039f

Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.

wolfgangwalther force-pushed the ci-merge-queue branch from 6a2009f to 04c039f Compare August 5, 2025 13:27

MattSturgeon approved these changes Aug 5, 2025

View reviewed changes

nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 5, 2025

wolfgangwalther mentioned this pull request Aug 7, 2025

Support merging PRs with merge queue enabled NixOS/nixpkgs-merge-bot#202

Closed

wolfgangwalther mentioned this pull request Aug 11, 2025

nixos/config/sysfs: run treefmt #432729

Merged

1 task

wolfgangwalther mentioned this pull request Aug 12, 2025

kcc: 8.0.4 -> 9.0.0 #432998

Merged

13 tasks

MattSturgeon approved these changes Aug 13, 2025

View reviewed changes

wolfgangwalther merged commit a895a97 into NixOS:master Aug 14, 2025
24 checks passed

wolfgangwalther deleted the ci-merge-queue branch August 14, 2025 07:28

github-actions bot added the 8.has: port to stable This PR already has a backport to the stable release. label Aug 14, 2025

nixpkgs-ci bot mentioned this pull request Aug 14, 2025

[Backport release-25.05] workflows/merge-group: init #433584

Merged

1 task

zowoq mentioned this pull request Aug 24, 2025

kanri: run nixfmt #436383

Merged

13 tasks

Uh oh!

workflows/merge-group: init #431146

workflows/merge-group: init #431146

Conversation

wolfgangwalther commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Things done

Uh oh!

Uh oh!

Uh oh!

emilazy left a comment

Choose a reason for hiding this comment

Uh oh!

emilazy Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

MattSturgeon Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

MattSturgeon Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

MattSturgeon Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emilazy Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

MattSturgeon Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emilazy Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

emilazy Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

MattSturgeon left a comment

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther commented Aug 5, 2025

Uh oh!

jfly commented Aug 7, 2025

Uh oh!

wolfgangwalther commented Aug 7, 2025

Uh oh!

wolfgangwalther commented Aug 11, 2025

Uh oh!

zowoq commented Aug 12, 2025

Uh oh!

MattSturgeon left a comment

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther commented Aug 14, 2025

Uh oh!

Uh oh!

nixpkgs-ci bot commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wolfgangwalther commented Aug 5, 2025 •

edited

Loading

MattSturgeon Aug 5, 2025 •

edited

Loading

MattSturgeon Aug 5, 2025 •

edited

Loading

MattSturgeon Aug 5, 2025 •

edited

Loading