Skip to content

Conversation

@wolfgangwalther
Copy link
Contributor

@wolfgangwalther wolfgangwalther commented Aug 5, 2025

Introduces a basic merge queue workflow to initially only run lints. This will avoid accidentally merging changes which break nixfmt after its recent update to 1.0.0.

cc @emilazy who asked for it in #428350 (comment) and #430675 (comment).

Things done


Add a 👍 reaction to pull requests you find important.

@wolfgangwalther wolfgangwalther mentioned this pull request Aug 5, 2025
13 tasks
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs backport release-25.05 labels Aug 5, 2025
Copy link
Member

@emilazy emilazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 💜

I hope we can get eval in here soon!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don’t want the actual list of required jobs to be the same, though. GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

Copy link
Contributor

@MattSturgeon MattSturgeon Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This comment was about the name, not the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don’t want the actual list of required jobs to be the same, though.

This comment was only about the name. This needs to be the same, even if the checks contained in the PR / Merge Group workflows are not the same, or even entirely different. The comment has meanwhile been removed anyway.

GItHub gates merge queue entry on required checks succeeding, even if they’re redundant to the ones checked in the merge queue. This is annoying and slows down merge velocity for no good reason – e.g. if you force‐push to fix a trivial typo, hit merge, and have the same required jobs on both PRs and in the merge queue, you have to wait 2 × cycle time for merge for no reason.

Once we add Eval, this is unavoidable. We need to run Eval in the PR, to be able to label based on the results. And to run nixpkgs-review, etc.

And we surely want to run Eval in the merge queue as well.

Since Eval takes the most time, we can also run all the other checks as well. This is just a matter of resources, but not of cycle time.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all.

I'm skeptical of this assertion. We should still run (most) CI on unfinished/unqueued PRs, as this is a much better feedback cycle than enqueing a PR and failing a group of PRs just to see what CI would do.

Even if you meant "run CI, but don't require it," I'm also skeptical of that: if we're running CI anyway, and it fails, why would we want to enqueue that PR?

At that point, the failing PR needs some attention; maybe just a rebase, maybe some actual changes, or maybe CI is broken. Either way, it shouldn't be enqueued in that state.

I could envisage postponing slow or expensive jobs until the final check in the queue, though. Assuming they are unlikely to need attention during a normal PR's lifecycle. There's a balance to be struck here between saving time+resources and only noticing CI issues after a PR is queued.

Copy link
Contributor

@MattSturgeon MattSturgeon Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in the long run, if we check everything in CI, we probably shouldn’t gate queue entry at all. For now, we should probably just remove things that are also checked in the queue.

I assume you mean to say that we'd still run these checks in the PR, but don't require them for addition to the merge queue. The problem with that is: You can't use auto-merge anymore. Which you can do right now already. I sure wouldn't want to potentially break other people's merge queue whenever I intend to use auto-merge.

I don't think GitHub's rulesets even properly support configuring required status checks separately for the PR and the queue. Even if we wanted to (which I don't think we do).

We can configure how each version runs, based on event (pull_request vs merge_group), but we cannot have it required only for the queue. The closest we could get to that would be to unconditionally skip the "no PR failures" job on the pull_request event, such that it is always considered successful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to be very careful not to do the same the other way around, aka canceling the merge group workflows when new events happen in the PR. Or alternatively... this could be a feature, leading to automatic cancellation of the merge queue for this PR.

FWIW, you have to do it explicitly in code, the concurrency stuff doesn’t support it. So we could implement whatever policy we wanted here, really.

That's not going to work well, I think. The merge_group event context doesn't have information about the Pull Request in question.

Also the workflow runs on the commit sha of the merged commit, thus we can't find the artifacts for this workflow with the labeler job. Thus anything that Eval produces here is useless for labeling purposes. Labeling will have to happen in the PR.

Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable. Whether it’s worth the effort, I don’t know.

I see your point for conflict-prone automated treewides. I wonder how they interact with merge queues and the "jump the queue" feature, though. I assume you should be able to:

  • cancel the existing queue
  • rebase on latest master and run your automation
  • run CI in you PR
  • jump the queue

And all of that with a rather low probability of hitting another merge conflict. But maybe that doesn't work well.

My experience was that waiting for eval to see rebuild numbers was somewhat likely to lead to more conflicts in the truly big treewides even before we had required checks or the merge queue. OTOH I think eval got faster since then.

But mostly I think it’s strictly redundant; if we’re checking things in a queue, everything redundant to that outside of the queue ultimately has strictly advisory lint status. We can’t stop people putting up broken PRs; what we care about is preventing them from landing. Merge queues inherently result in some decrease of merge velocity, and I think we can get some of it back by not blocking entry into the queue and cancelling redundant jobs.

Hm, not entirely true. We do have some kinds of checks that are semi-required by design. Currently that's the "check cherry-picks" job. We require this job to run and succeed (for the error cases) and we require the committer to dismiss the automated review (for the warning cases). We surely want to run this check to completion before merge.

It could still just boot the PR out of the queue if it’s not happy, right? But I agree that it’s benign to require checks that should only take a few seconds anyway.

Copy link
Contributor

@MattSturgeon MattSturgeon Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could explore ways to make things like eval only required for merge_group, while still having faster things like linting and cherry-pick validation required in pr.yml.

I'm still not convinced either way, but the example of conflict-prone PRs is valid.

IMO fine-tuning which checks run and/or are required by each workflow is something that should be done after we have the initial Merge Queue setup.


Wild idea: would it be reasonable to configure (some) of the required checks using labels? E.g. you could have a "conflict prone" label that tells pr.yml to skip some of its slower checks, or at least not require them in no PR failures.

That way it's an explicit opt-in to skip certain checks until the merge_group run, while other PRs could still use an "auto merge" that only enters the queue once CI looks green.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think that this is plain a bug of GitHub merge queues. Like, I do not think that if you were designing a code review and merge queue system from scratch you would come up with the idea that even assessing a PR for merge should gate on the same checks outside the queue. The queue is the gate. Being able to optimistically enter into the queue and check changes efficiently while still enforcing monotonic properties of the merged branch is the advantage – otherwise you could achieve the same goal with required PR checks and enforcing al PRs to be up to date with HEAD. GitHub required checks are just designed wrong. If you look at the thread where they announced it there’s a bunch of people pointing out this bizarre misfeature, FWIW.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, none of this is a blocker. But getting monotonic properties on our main branches while still having maximum concurrency of checks and merge velocity is why I’m so eager to see us move stuff into queues.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of stuff to think about, won't comment on most of it, but will keep thinking about this.

It could still just boot the PR out of the queue if it’s not happy, right?

If we can't detect the boundaries of where a PR starts in the list of commits between base...head of the merge queue, then we can't say which PR to boot out of the queue. We might kick out other PRs that we didn't intend to.

(This is all about checks which don't check the current head commit, but each commit on its own, as it is required for linting commit messages and such)

Of course, if we can find out which commits belong to each PR, this would be possible but from experience so far, the following is certainly not easy:

Hmm. GitHub’s UI shows what PRs a commit is part of (so I’d presume it’s accessible through the API too), and we can descend from the merge commit into the commits being merged. So I think looking up the corresponding PR is viable.

Introduces a basic merge queue workflow to initially only run lints.
This will avoid accidentally merging changes which break nixfmt after
its recent update to 1.0.0.
Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@nixpkgs-ci nixpkgs-ci bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Aug 5, 2025
@wolfgangwalther
Copy link
Contributor Author

Adding the merge queue essentially means that we would take away the ability from committers to bypass "required status checks" in PRs, see discussion in NixOS/org#149 (comment).

This shouldn't stop this PR from being merged, though. It's rather simple and the remaining discussion is only about how to actually the implement the branch ruleset (for now).

@jfly
Copy link
Contributor

jfly commented Aug 7, 2025

Does introducing this require a corresponding change to the nixpkgs-merge-bot?

@wolfgangwalther
Copy link
Contributor Author

Does introducing this require a corresponding change to the nixpkgs-merge-bot?

That's a very important point, thanks for raising it. I looked into it:

My conclusion: The nixpkgs-merge-bot would very likely try to bypass the merge queue every time. Since the bot doesn't seem to be part of the nixpkgs-committers team, it would fail to do so. The merge-bot would be broken.

I think the merge-bot needs to replicate the logic of the CLI tool: Check whether the PR needs a merge queue and if yes enable auto-merge via graphql instead of merging directly.

@wolfgangwalther
Copy link
Contributor Author

[happened] in #428350 (comment) and #430675 (comment).

Happened here as well: #391329.

@zowoq
Copy link
Contributor

zowoq commented Aug 12, 2025

[happened] in #428350 (comment) and #430675 (comment).

Happened here as well: #391329.

And in #426382, fixed in #432998.

@wolfgangwalther wolfgangwalther mentioned this pull request Aug 12, 2025
13 tasks
Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this workflow on master, before enabling the actual ruleset requirement in the repo settings. Even if we change our minds about using merge queues, this workflow can easily be removed later. Therefore I don't think this is blocked by agreeing to require merge queues.

The needs list can evolve over time, so starting initially with just lint seems reasonable. Same for the PR needs list; no changes for now seems reasonable.

Most discussion on this PR has been focused on how we will use merge queues going forward, so it shouldn't block merging this initial workflow now.

LGTM.

@wolfgangwalther
Copy link
Contributor Author

I added the open TODO for nixpkgs-merge-bot to NixOS/org#149 to prevent enabling merge queues without support for that.

@wolfgangwalther wolfgangwalther merged commit a895a97 into NixOS:master Aug 14, 2025
24 checks passed
@wolfgangwalther wolfgangwalther deleted the ci-merge-queue branch August 14, 2025 07:28
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Aug 14, 2025

Successfully created backport PR for release-25.05:

@github-actions github-actions bot added the 8.has: port to stable This PR already has a backport to the stable release. label Aug 14, 2025
@zowoq zowoq mentioned this pull request Aug 24, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants