-
Notifications
You must be signed in to change notification settings - Fork 89
Test lean CI PR run option #540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
WalkthroughAdds Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ok to test abc92ea |
|
/ok to test 3a79e12 |
|
/ok to test bdf1ba9 |
|
/ok to test bb5a483 |
|
/ok to test a73e916 |
|
/ok to test b7a765d |
|
/ok to test 7f01697 |
|
/ok to test f3394f3 |
…ublic into add_lean_ci_option
|
/ok to test eac0671 |
|
@msarahan may I get your review on this PR |
|
@gforsyth May I get your review on this PR |
.github/workflows/pr.yaml
Outdated
| env: | ||
| # Set to 'true' for lean CI (amd64, latest Python/CUDA only) or 'false' for full CI (all platforms) | ||
| LEAN_CI: 'false' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is going to work.
First, you can imagine the headache of constantly switching this back and forth on a per-PR basis.
But more importantly, when PRs are opened from forks, the workflows aren't run from changes to that fork, they're run from the base repo to prevent malicious changes to CI.
These changes would only take effect after the PR is merged, which is too late.
If you want to have variable CI runs, I would recommend instead setting up some conditions where the matrix_filter gets adjusted if a specific label is applied to a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is mostly meant for long running PRs which span across several weeks and also to keep the run short so that we don't spend too much time on running whole matrix.
- This PR is created from fork and I was able to run my changes in this PR, I think authorized user is the key on running or stopping a PR
- I like the idea on label, let me explore that. Thank you for the tip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
.github/workflows/pr.yaml (2)
56-58: Add defensive checks around PR number extraction.The sed extraction of PR number from
github.refassumes a well-formed pattern (e.g.,refs/heads/pull-request/123), but includes no error handling if the pattern doesn't match or the environment variable is malformed. If this fails silently, subsequentgh pr viewcommands will produce cryptic errors.Add validation after extraction:
PR_NUMBER=$(echo "${{ github.ref }}" | sed 's|refs/heads/pull-request/||') if ! [[ "$PR_NUMBER" =~ ^[0-9]+$ ]]; then echo "ERROR: Failed to extract valid PR number from ref: ${{ github.ref }}" exit 1 fiThis ensures failures are clear and immediate.
Also applies to: 85-87
97-116: Add inline documentation for matrix filter logic.The matrix filter expressions (especially non-lean mode) use complex jq syntax with multiple grouping and selection strategies. Without context, it's unclear why each component has a different filter strategy or what the lean vs non-lean modes achieve for each.
Consider adding a comment block explaining each filter:
- name: Set matrix filters id: set-filters run: | if [ "${{ env.LEAN_CI }}" == "true" ]; then # Lean CI: select single configuration (amd64 + specific Python + latest CUDA) echo "conda_lean_filter=..." >> $GITHUB_OUTPUT # C++ build: amd64, PY 3.10, latest CUDA echo "conda_test_filter=..." >> $GITHUB_OUTPUT # Python test: amd64, PY 3.13, latest CUDA # ... etc for other components else # Full CI: select representative configurations per component echo "conda_lean_filter=." >> $GITHUB_OUTPUT # No filter (all combos) echo "libcuopt_filter=..." >> $GITHUB_OUTPUT # One Python per CUDA major version # ... etc fiAlso, clarify in a comment why
cuopt_sh_client_filterusesmin_bywith negation (line 115) rather than direct selection.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr.yaml(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: wheel-build-libcuopt / 13.0.1, 3.12, amd64, rockylinux8
- GitHub Check: conda-cpp-build / 13.0.1, 3.10, amd64, rockylinux8
🔇 Additional comments (3)
.github/workflows/pr.yaml (3)
30-31: Dependency chain looks sound.Adding
check-lean-cias a hard gate ensures PRs with the lean-ci label cannot merge, andcompute-matrix-filtersas an orchestrator ensures all downstream jobs receive consistent filter outputs.
189-212: Verify consistency of filter naming across build and test jobs.The
conda-cpp-buildjob (line 195) usesconda_lean_filter, whileconda-python-build(line 212) usesconda_test_filter. This naming divergence suggests different selection strategies, but the distinction is unclear from context.Please verify:
- Why cpp-build and python-build use different filters (lean vs test)?
- Whether the filter logic correctly reflects the intended behavior for each component?
49-116: Confirmed: Jobs consume lean filters even when check-lean-ci fails.The contradiction is real. When the
lean-cilabel is present:
check-lean-cifails and blocks thepr-builderjob- However,
check-lean-ciis not a dependency for intermediate jobs likeconda-cpp-build,wheel-build-cuopt, etc.- These intermediate jobs depend only on
compute-matrix-filters, which still outputs lean filters- As a result, CI runs in lean mode (reduced matrix), but then fails at merge time due to
check-lean-ciDesign issue: If lean-ci is meant only for testing and should never reach merge, either:
- Remove
check-lean-cientirely and let lean CI run during development, or- Make intermediate jobs also depend on
check-lean-ciso CI fails early before wasting resourcesCurrently, the workflow wastes CI resources running lean-mode tests only to reject the PR at the end. Clarify the intended workflow and align job dependencies accordingly.
| check-lean-ci: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Check for lean-ci label and fail if present | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} | ||
| run: | | ||
| # Extract PR number from branch name (pull-request/123 -> 123) | ||
| PR_NUMBER=$(echo "${{ github.ref }}" | sed 's|refs/heads/pull-request/||') | ||
| echo "Checking PR #$PR_NUMBER for lean-ci label..." | ||
| # Check if the PR has the 'lean-ci' label | ||
| if gh pr view "$PR_NUMBER" --repo "${{ github.repository }}" --json labels --jq '.labels[].name' | grep -q "^lean-ci$"; then | ||
| echo "❌ ERROR: This PR has the 'lean-ci' label enabled." | ||
| echo "Lean CI is only for testing purposes and should not be merged." | ||
| echo "Please remove the 'lean-ci' label and run full CI before merging." | ||
| exit 1 | ||
| else | ||
| echo "✅ Full CI is enabled. PR can be merged once all checks pass." | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical architecture issue: Fork-based PRs won't use updated workflow.
As noted in the previous review by @gforsyth, GitHub Actions workflows triggered from fork-based PRs are executed using the workflow files from the base repository, not the fork. This means the new check-lean-ci and compute-matrix-filters jobs introduced here won't actually run on PRs opened from forks until the changes are merged to main—at which point it's too late to validate the lean-ci mode on that PR.
Only authorized users (pushing to the base repo directly or in pushes to base repo branches) will see these checks apply. This severely limits the utility of the feature for the broader contributor base and creates a confusing UX where the feature works for some but not others.
Consider an alternative approach such as:
- Using workflow inputs or repository settings (which take effect immediately) instead of branch-based workflow changes.
- Documenting this limitation prominently so contributors understand why the checks may not appear on their fork-based PRs.
- Requiring lean-ci runs only for maintainer-initiated PRs or pre-merge validation steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have run in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeRabbit I was able to run the changes in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgsl888prabhu You're absolutely right! Since this PR is from a branch within the same repository (not a fork), the workflow changes do run as expected.
My concern was specifically about external contributors who would open PRs from forks—they wouldn't see these workflow changes until after they're merged to main. However, looking at the PR description mentioning this is "intended for EPIC PRs," it seems this feature is primarily designed for maintainer/internal use cases where PRs are opened from same-repo branches.
If the lean-ci feature is mainly for internal testing and EPIC PRs (which would typically come from same-repo branches), then the fork limitation isn't a practical concern. Thanks for clarifying! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeRabbit I have opened this PR from fork rgsl888prabhu:add_lean_ci_option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the PR details to verify if it's from a fork or same-repo branch
gh pr view 540 --repo NVIDIA/cuopt --json headRepositoryOwner,headRepository,headRefName,baseRepository,baseRefName
Length of output: 675
🏁 Script executed:
#!/bin/bash
# Check if this is a cross-repository (fork) PR
gh pr view 540 --repo NVIDIA/cuopt --json isCrossRepository,headRepositoryOwner,headRepository,headRefName,baseRefName
Length of output: 260
@rgsl888prabhu You're absolutely correct, and I apologize for the confusion! 🙏
I verified with the GitHub CLI and confirmed this is indeed a cross-repository (fork) PR from rgsl888prabhu/cuopt_public to NVIDIA/cuopt.
I was wrong about the workflow behavior. The key difference here is the trigger type used in this workflow:
on:
push:
branches:
- "pull-request/[0-9]+"This workflow uses a push trigger on the pull-request/[0-9]+ branch pattern, not a pull_request trigger. When you open a PR from your fork:
- GitHub creates a merge branch named
pull-request/540in the base repository (NVIDIA/cuopt) - That branch contains your updated workflow files
- Pushes to your fork branch update this merge branch
- The
pushtrigger fires in the base repo context with the updated workflows
This is different from workflows using pull_request triggers, which do use the base branch's workflow files for security. Your setup actually allows workflow changes to be tested in fork PRs before merging, which is exactly what you're doing here!
Thank you for the correction and for persisting with the clarification. The workflow design is working as intended! ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gforsyth for viz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/pr.yaml (1)
85-116: Consider clarifying filter naming and documenting jq selection logic.The seven matrix filter outputs have names like
conda_lean_filterandconda_test_filterthat suggest mode-specificity, but they are actually context-dependent filters that vary by job type and mode. Additionally, the non-lean jq expressions (lines 112–115) use different selection criteria (grouping by arch/python, selecting max CUDA vs max Python) with no documentation explaining the rationale:
- Line 112:
mps_parser_filtergroups by arch+pyver, selects max CUDA per group- Line 113:
libcuopt_filtergroups by arch+CUDA major, selects max pyver per group- Line 114:
cuopt_server_filterfilters amd64, groups by CUDA major, selects by both pyver and CUDA- Line 115:
cuopt_sh_client_filterfilters amd64, selects min by pyver then negated CUDA (oldest Python, newest CUDA)These inconsistencies may be intentional (different packages have different matrix needs), but they are not self-documenting. Consider adding brief inline comments explaining the selection strategy for each filter so maintainers understand the constraints.
Example:
run: | if [ "${{ needs.check-lean-ci.outputs.lean_ci_enabled }}" == "true" ]; then # Lean mode: single build per job type, fixed Python versions, latest CUDA echo "conda_lean_filter=[map(select(.ARCH == \"amd64\" and .PY_VER == \"3.10\")) | max_by(.CUDA_VER | split(\".\") | map(tonumber))]" >> $GITHUB_OUTPUT # ... rest of lean filters else # Full CI: multiple combinations per package requirement # mps_parser: one per arch+python combo, latest CUDA for each echo "mps_parser_filter=group_by([.ARCH, (.PY_VER |split(\".\") | map(tonumber))])|map(max_by([(.CUDA_VER|split(\".\")|map(tonumber))]))" >> $GITHUB_OUTPUT # libcuopt: one per arch+CUDA major, latest Python for each echo "libcuopt_filter=group_by([.ARCH, (.CUDA_VER|split(\".\")|map(tonumber)|.[0])]) | map(max_by(.PY_VER|split(\".\")|map(tonumber)))" >> $GITHUB_OUTPUT # cuopt_server: amd64 only, grouped by CUDA major, latest Python per major version echo "cuopt_server_filter=map(select(.ARCH == \"amd64\")) | group_by(.CUDA_VER|split(\".\")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(\".\")|map(tonumber)), (.CUDA_VER|split(\".\")|map(tonumber))]))" >> $GITHUB_OUTPUT # cuopt_sh_client: amd64 only, oldest Python + newest CUDA (noarch package) echo "cuopt_sh_client_filter=[map(select(.ARCH == \"amd64\")) | min_by((.PY_VER | split(\".\") | map(tonumber)), (.CUDA_VER | split(\".\") | map(-tonumber)))]" >> $GITHUB_OUTPUT fi
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr.yaml(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: wheel-build-libcuopt / 13.0.1, 3.12, amd64, rockylinux8
🔇 Additional comments (5)
.github/workflows/pr.yaml (5)
51-72: Clear lean-ci detection logic—good encapsulation.The
check-lean-cijob cleanly extracts and evaluates the label, and the output is properly consumed downstream. The logic is maintainable and the console messages provide good visibility.
74-84: Merge guard prevents inadvertent lean-ci merges.Good safety measure. The job fails clearly with informative messaging if the lean-ci label is present, protecting against incomplete CI runs being merged.
29-32: Verify pr-builder semantics withif: always().The pr-builder job now depends on the three new orchestration jobs and uses
if: always(). This allows pr-builder to run even if some dependencies fail, which is necessary for it to aggregate and report the overall status correctly (including failures fromprevent-merge-with-lean-ci). However, confirm this aligns with the shared-workflows' pr-builder expectations and that lean-ci failures properly block the PR.Also applies to: 50-50
189-310: Downstream job wiring looks correct.All conda and wheel build/test jobs now depend on
compute-matrix-filtersand consume the appropriate filter outputs. The dependency chains properly enforce ordering (e.g.,wheel-build-libcuoptwaits forwheel-build-cuopt-mps-parser).
312-312: Verify test-self-hosted-server intentionally omits compute-matrix-filters.The
test-self-hosted-serverjob does not depend oncompute-matrix-filtersand has nomatrix_filterinput. Confirm this is intentional (e.g., the job has no matrix definition or uses a fixed configuration) rather than an oversight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/pr.yaml (1)
28-49: Job dependencies are properly wired.The three new lean-ci jobs are correctly added to
pr-builderneeds, ensuring the aggregator won't succeed until all checks (including lean-ci validation) complete.Consider grouping related dependencies for clarity:
needs: # Lean CI checks - check-lean-ci - prevent-merge-with-lean-ci - compute-matrix-filters # Standard checks - changed-files - checks # Build jobs - conda-cpp-build # ... etc
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr.yaml(7 hunks)
🔇 Additional comments (6)
.github/workflows/pr.yaml (6)
50-72: Lean CI label detection logic is clear and functional.The PR label check is well-implemented with proper token handling, clear messaging, and correct output formatting for downstream consumption.
73-83: Merge guard is appropriately configured.The
prevent-merge-with-lean-cijob correctly gatekeeps PRs with the lean-ci label, providing clear remediation instructions.
188-221: Matrix filters are consistently propagated to conda jobs.All conda jobs correctly receive
compute-matrix-filtersas a dependency and appropriate filter outputs. However, note a potential naming ambiguity:Line 211 (
conda-python-build) and line 221 (conda-python-tests) both useconda_test_filter. While this is intentional (both should use the same Python version), the nameconda_test_filtermight mislead future maintainers into thinking it's only for tests.Consider renaming
conda_test_filtertoconda_python_filterorconda_test_py_filterto clarify it applies to both python-build and python-tests steps. This would avoid confusion between:
conda_lean_filter(cpp-build, Python 3.10)conda_test_filter(both python-build and python-tests, Python 3.13)echo "conda_lean_filter=[map(select(.ARCH == \"amd64\" and .PY_VER == \"3.10\")) | max_by(.CUDA_VER | split(\".\") | map(tonumber))]" >> $GITHUB_OUTPUT -echo "conda_test_filter=[map(select(.ARCH == \"amd64\" and .PY_VER == \"3.13\")) | max_by(.CUDA_VER | split(\".\") | map(tonumber))]" >> $GITHUB_OUTPUT +echo "conda_python_filter=[map(select(.ARCH == \"amd64\" and .PY_VER == \"3.13\")) | max_by(.CUDA_VER | split(\".\") | map(tonumber))]" >> $GITHUB_OUTPUTThen update all references to
conda_test_filteracross lines 101, 109, 203, 211, and 221.
222-234: Verify thatdocs-buildintentionally skips lean-ci filtering.The
docs-buildjob does not depend on or receive configuration fromcompute-matrix-filters. If this job has no matrix dimensions, this is fine. However, if it should respect lean-ci mode, it needs the dependency and amatrix_filterinput.Does
docs-build(and its underlyingcustom-job.yamlworkflow) support amatrix_filterinput? If so, should it be added here?
310-316: Verify thattest-self-hosted-serverintentionally skips lean-ci filtering.Similar to
docs-build, thetest-self-hosted-serverjob does not depend oncompute-matrix-filters. If this is a fixed, single-run job, it's appropriate. However, if it has matrix dimensions that should be filtered under lean-ci mode, it needs the dependency and configuration.Does
test-self-hosted-server(and its underlying workflow) support amatrix_filterinput? If so, should it be gated by lean-ci mode?
84-115: All jq filters are syntactically valid and correctly applied.Verification confirms:
- All 7 matrix filters pass jq syntax validation with proper array inputs
- Filters are correctly referenced in downstream jobs (cpp-build, cpp-tests, python builds/tests, wheel operations)
- Non-lean mode filters properly handle full matrix combinations
However, the Python version asymmetry remains unverified:
- Lean mode assigns 3.10 for cpp-build but 3.13 for cpp-tests (and python operations)
- This could be intentional (test compatibility across Python versions) or an oversight
Confirm whether the cpp-build→3.10 and cpp-tests→3.13 split is by design, or should both use the same version.
|
@gforsyth May I get your review on this PR |
gforsyth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like the extra guard against merging in PRs with lean-ci set
Description
This adds an option to run PRs in lean mode where it runs only for amd64, one python version and a cuda version.
This is meant for EPIC PRs which need to do minimal CI runs to make sure there are not major breaking issues.
Issue
Checklist
Summary by CodeRabbit