fix: Include NVLink and InfiniBand Intrerfaces while cleaning up Instance resources#366
Conversation
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-04-09 19:35:27 UTC | Commit: abe9985 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughAdds a migration that soft-deletes orphaned InfiniBand and NVLink interfaces; updates instance deletion to remove InfiniBand and NVLink interface records within the same transaction; adds a unit test verifying removal. Changes
Sequence Diagram(s)sequenceDiagram
participant Manager as Instance Manager
participant IBDAO as InfiniBand DAO
participant NVDAO as NVLink DAO
participant DB as Database
Manager->>DB: Begin transaction
Manager->>IBDAO: Get interfaces for InstanceID
IBDAO->>DB: SELECT ... WHERE instance_id = X
DB-->>IBDAO: rows
Manager->>IBDAO: Delete each InfiniBand interface (within tx)
IBDAO->>DB: DELETE ... WHERE id = Y
DB-->>IBDAO: OK
Manager->>NVDAO: Get interfaces for InstanceID
NVDAO->>DB: SELECT ... WHERE instance_id = X
DB-->>NVDAO: rows
Manager->>NVDAO: Delete each NVLink interface (within tx)
NVDAO->>DB: DELETE ... WHERE id = Z
DB-->>NVDAO: OK
Manager->>DB: Commit transaction
DB-->>Manager: OK
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
workflow/pkg/activity/instance/instance.go (1)
1695-1739: Prefer bulk deletes for instance-scoped interface cleanup.These two blocks add two extra reads plus one delete round-trip per interface while the transaction is open. A DAO-level
DeleteByInstanceID/ClearByInstanceIDhelper would keep this path smaller, reduce transaction time, and avoid copying the same list/delete/rollback pattern again for each interface family.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workflow/pkg/activity/instance/instance.go` around lines 1695 - 1739, Replace the per-interface read+loop deletes with DAO-level bulk delete helpers to avoid extra queries and round-trips: add methods like InfiniBandInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) and NVLinkInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) (or ClearByInstanceID) and call those instead of using NewInfiniBandInterfaceDAO/GetAll + ibiDAO.Delete loop and NewNVLinkInterfaceDAO/GetAll + nvliDAO.Delete loop; preserve the existing error logging and transaction rollback behavior (use the same logger error messages and tx.Rollback() handling) when the new DeleteByInstanceID calls return an error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go`:
- Around line 36-50: The migration is issuing hard DELETEs against soft-delete
tables (infiniband_interface and nvlink_interface); instead update those rows to
mark them deleted so history is preserved. Replace the tx.Exec DELETE statements
that reference infiniband_interface and nvlink_interface with UPDATE statements
that set the deleted timestamp (and updated timestamp) for rows where there is
no active instance (matching the same NOT EXISTS subquery), and keep using
handleError(tx, err) after each Exec; ensure the UPDATE targets the same tables
and conditions used in the current DELETEs so behavior remains equivalent except
for soft-deleting rather than removing rows.
---
Nitpick comments:
In `@workflow/pkg/activity/instance/instance.go`:
- Around line 1695-1739: Replace the per-interface read+loop deletes with
DAO-level bulk delete helpers to avoid extra queries and round-trips: add
methods like InfiniBandInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) and
NVLinkInterfaceDAO.DeleteByInstanceID(ctx, tx, instanceID) (or
ClearByInstanceID) and call those instead of using
NewInfiniBandInterfaceDAO/GetAll + ibiDAO.Delete loop and
NewNVLinkInterfaceDAO/GetAll + nvliDAO.Delete loop; preserve the existing error
logging and transaction rollback behavior (use the same logger error messages
and tx.Rollback() handling) when the new DeleteByInstanceID calls return an
error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7acc6801-7c81-4bd9-8e01-75cc92b96875
📒 Files selected for processing (3)
db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.goworkflow/pkg/activity/instance/instance.goworkflow/pkg/activity/instance/instance_test.go
db/pkg/migrations/20260409120000_cleanup_orphan_infiniband_nvlink_interfaces.go
Outdated
Show resolved
Hide resolved
|
|
||
| // Delete InfiniBand interface(s) corresponding to instance | ||
| ibiDAO := cdbm.NewInfiniBandInterfaceDAO(mi.dbSession) | ||
| ibis, _, err := ibiDAO.GetAll(ctx, tx, cdbm.InfiniBandInterfaceFilterInput{InstanceIDs: []uuid.UUID{instance.ID}}, cdbp.PageInput{Limit: cdb.GetIntPtr(cdbp.TotalLimit)}, nil) |
There was a problem hiding this comment.
Should we consider a DeleteAll method in DAO instead?
There was a problem hiding this comment.
We can, however, observe that the current pattern (listing by InstanceIDs and deleting each row) is clear, reuses the same Delete path as everywhere else, and is only incorrect if we are concerned about N round-trips or very large interface counts per instance. In our cases, these interface counts are relatively low, so I would suggest that we maintain the same approach as with the other one.
3dec9b9 to
cd7f67e
Compare
…ance resources Signed-off-by: Hitesh Wadekar <hwadekar@nvidia.com>
Signed-off-by: Hitesh Wadekar <hwadekar@nvidia.com>
cd7f67e to
ed0995e
Compare
thossain-nv
left a comment
There was a problem hiding this comment.
@hwadekar-nv We can merge this. It would be good to have an efficient DeleteAll but it's not required for this PR and can be added in a separate PR.
|
Looks good @hwadekar-nv, let's merge. |
|
Thanks @thossain-nv |
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes