Skip to content

fix: Include NVLink and InfiniBand Intrerfaces while cleaning up Instance resources#366

Merged
hwadekar-nv merged 3 commits intomainfrom
fix/ib-nvlink-interface-deletion
Apr 10, 2026
Merged

fix: Include NVLink and InfiniBand Intrerfaces while cleaning up Instance resources#366
hwadekar-nv merged 3 commits intomainfrom
fix/ib-nvlink-interface-deletion

Conversation

@hwadekar-nv
Copy link
Copy Markdown
Contributor

@hwadekar-nv hwadekar-nv commented Apr 9, 2026

  1. In the cleanup function when an instance is being terminated, Added NvLink and InfiniBand interfaces.
  2. Added a DB migration to remove orphaned NvLink and InfiniBand interfaces (those whose instances have been deleted from the database).

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@hwadekar-nv hwadekar-nv self-assigned this Apr 9, 2026
@hwadekar-nv hwadekar-nv requested a review from a team as a code owner April 9, 2026 19:32
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-04-09 19:35:27 UTC | Commit: abe9985

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60f87c2e-65a5-4c8e-bd7b-3d3061ab79b0

📥 Commits

Reviewing files that changed from the base of the PR and between ed0995e and f97142a.

📒 Files selected for processing (1)
  • db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go

Summary by CodeRabbit

  • Chores

    • Added a database migration to clean up orphaned network interfaces.
  • Bug Fixes

    • Instance deletion now also removes associated InfiniBand and NVLink interfaces.
  • Tests

    • Added unit tests verifying InfiniBand and NVLink interface cleanup during instance deletion.

Walkthrough

Adds a migration that soft-deletes orphaned InfiniBand and NVLink interfaces; updates instance deletion to remove InfiniBand and NVLink interface records within the same transaction; adds a unit test verifying removal.

Changes

Cohort / File(s) Summary
Database Migration
db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go
New migration registering an up migration that begins a SQL transaction and soft-deletes infiniband_interface and nvlink_interface rows whose instance_id no longer references an active instance; down migration is a no-op.
Instance Deletion Logic
workflow/pkg/activity/instance/instance.go
deleteInstanceFromDB now queries and deletes InfiniBand and NVLink interface records for the instance inside the same transaction, rolling back on errors.
Tests
workflow/pkg/activity/instance/instance_test.go
New unit test TestManageInstance_deleteInstanceFromDB_removesIBAndNVLinkInterfaces that creates a terminating instance, provisions one InfiniBand and one NVLink interface, runs deleteInstanceFromDB, and asserts those interface rows are removed.

Sequence Diagram(s)

sequenceDiagram
    participant Manager as Instance Manager
    participant IBDAO as InfiniBand DAO
    participant NVDAO as NVLink DAO
    participant DB as Database

    Manager->>DB: Begin transaction
    Manager->>IBDAO: Get interfaces for InstanceID
    IBDAO->>DB: SELECT ... WHERE instance_id = X
    DB-->>IBDAO: rows
    Manager->>IBDAO: Delete each InfiniBand interface (within tx)
    IBDAO->>DB: DELETE ... WHERE id = Y
    DB-->>IBDAO: OK
    Manager->>NVDAO: Get interfaces for InstanceID
    NVDAO->>DB: SELECT ... WHERE instance_id = X
    DB-->>NVDAO: rows
    Manager->>NVDAO: Delete each NVLink interface (within tx)
    NVDAO->>DB: DELETE ... WHERE id = Z
    DB-->>NVDAO: OK
    Manager->>DB: Commit transaction
    DB-->>Manager: OK
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the changeset: ensuring NVLink and InfiniBand interfaces are included in instance cleanup operations.
Description check ✅ Passed The description clearly articulates both changes: the cleanup function enhancement and the new database migration for orphaned interface removal.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ib-nvlink-interface-deletion

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Test Results

8 597 tests  +1   8 597 ✅ +1   8m 1s ⏱️ +18s
  143 suites ±0       0 💤 ±0 
   14 files   ±0       0 ❌ ±0 

Results for commit f97142a. ± Comparison against base commit 14262c0.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
workflow/pkg/activity/instance/instance.go (1)

1695-1739: Prefer bulk deletes for instance-scoped interface cleanup.

These two blocks add two extra reads plus one delete round-trip per interface while the transaction is open. A DAO-level DeleteByInstanceID/ClearByInstanceID helper would keep this path smaller, reduce transaction time, and avoid copying the same list/delete/rollback pattern again for each interface family.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workflow/pkg/activity/instance/instance.go` around lines 1695 - 1739, Replace
the per-interface read+loop deletes with DAO-level bulk delete helpers to avoid
extra queries and round-trips: add methods like
InfiniBandInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) and
NVLinkInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) (or
ClearByInstanceID) and call those instead of using
NewInfiniBandInterfaceDAO/GetAll + ibiDAO.Delete loop and
NewNVLinkInterfaceDAO/GetAll + nvliDAO.Delete loop; preserve the existing error
logging and transaction rollback behavior (use the same logger error messages
and tx.Rollback() handling) when the new DeleteByInstanceID calls return an
error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go`:
- Around line 36-50: The migration is issuing hard DELETEs against soft-delete
tables (infiniband_interface and nvlink_interface); instead update those rows to
mark them deleted so history is preserved. Replace the tx.Exec DELETE statements
that reference infiniband_interface and nvlink_interface with UPDATE statements
that set the deleted timestamp (and updated timestamp) for rows where there is
no active instance (matching the same NOT EXISTS subquery), and keep using
handleError(tx, err) after each Exec; ensure the UPDATE targets the same tables
and conditions used in the current DELETEs so behavior remains equivalent except
for soft-deleting rather than removing rows.

---

Nitpick comments:
In `@workflow/pkg/activity/instance/instance.go`:
- Around line 1695-1739: Replace the per-interface read+loop deletes with
DAO-level bulk delete helpers to avoid extra queries and round-trips: add
methods like InfiniBandInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) and
NVLinkInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) (or
ClearByInstanceID) and call those instead of using
NewInfiniBandInterfaceDAO/GetAll + ibiDAO.Delete loop and
NewNVLinkInterfaceDAO/GetAll + nvliDAO.Delete loop; preserve the existing error
logging and transaction rollback behavior (use the same logger error messages
and tx.Rollback() handling) when the new DeleteByInstanceID calls return an
error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7acc6801-7c81-4bd9-8e01-75cc92b96875

📥 Commits

Reviewing files that changed from the base of the PR and between 07cca2a and abe9985.

📒 Files selected for processing (3)
  • db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go
  • workflow/pkg/activity/instance/instance.go
  • workflow/pkg/activity/instance/instance_test.go


// Delete InfiniBand interface(s) corresponding to instance
ibiDAO := cdbm.NewInfiniBandInterfaceDAO(mi.dbSession)
ibis, _, err := ibiDAO.GetAll(ctx, tx, cdbm.InfiniBandInterfaceFilterInput{InstanceIDs: []uuid.UUID{instance.ID}}, cdbp.PageInput{Limit: cdb.GetIntPtr(cdbp.TotalLimit)}, nil)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider a DeleteAll method in DAO instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, however, observe that the current pattern (listing by InstanceIDs and deleting each row) is clear, reuses the same Delete path as everywhere else, and is only incorrect if we are concerned about N round-trips or very large interface counts per instance. In our cases, these interface counts are relatively low, so I would suggest that we maintain the same approach as with the other one.

@hwadekar-nv hwadekar-nv force-pushed the fix/ib-nvlink-interface-deletion branch from 3dec9b9 to cd7f67e Compare April 9, 2026 21:24
…ance resources

Signed-off-by: Hitesh Wadekar <hwadekar@nvidia.com>
Signed-off-by: Hitesh Wadekar <hwadekar@nvidia.com>
@hwadekar-nv hwadekar-nv force-pushed the fix/ib-nvlink-interface-deletion branch from cd7f67e to ed0995e Compare April 9, 2026 21:24
Copy link
Copy Markdown
Contributor

@thossain-nv thossain-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwadekar-nv We can merge this. It would be good to have an efficient DeleteAll but it's not required for this PR and can be added in a separate PR.

@thossain-nv
Copy link
Copy Markdown
Contributor

Looks good @hwadekar-nv, let's merge.

@hwadekar-nv hwadekar-nv merged commit 08169ef into main Apr 10, 2026
97 checks passed
@hwadekar-nv hwadekar-nv deleted the fix/ib-nvlink-interface-deletion branch April 10, 2026 16:07
@hwadekar-nv
Copy link
Copy Markdown
Contributor Author

Thanks @thossain-nv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants