Skip to content

Conversation

@akshaydeo
Copy link
Contributor

@akshaydeo akshaydeo commented Dec 11, 2025

Summary

Fix deadlocks in budget persistence by replacing the SELECT + UPDATE pattern with direct UPDATE statements.

Changes

  • Replaced the read-then-write pattern for budget updates with direct UPDATE statements to avoid lock escalation
  • Removed dependency on errors package as it's no longer needed
  • Improved detection of deleted budgets by checking the number of affected rows
  • Added comments explaining the deadlock issue and solution

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Run multiple instances of the application concurrently to verify that budget persistence no longer causes deadlocks:

# Run tests for the governance plugin
go test ./plugins/governance/...

# Run integration tests that exercise budget persistence
go test ./integration/... -run TestBudgetPersistence

Breaking changes

  • Yes
  • No

Related issues

Fixes deadlocks that occur when multiple instances attempt to persist budgets simultaneously.

Security considerations

No security implications.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Optimized budget management operations by implementing atomic database updates for improved transaction reliability and reduced operational conflicts
    • Enhanced detection and automatic removal of deleted budget entries to maintain consistent system state
  • Documentation

    • Added clarifying comments for budget operation handling
  • Chores

    • Removed unused code dependencies

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Modified the DumpBudgets function in the governance store to replace the read-then-write pattern with a direct atomic UPDATE operation, preventing deadlocks. Added deletion handling based on RowsAffected and adjusted usage calculations to include baseline values when present.

Changes

Cohort / File(s) Summary
Deadlock-avoidance refactoring
plugins/governance/store.go
Removed unused errors import. Replaced budget read-then-write flow with atomic UPDATE to prevent deadlocks. Updated newUsage calculation to include baselines. Added RowsAffected-based deletion handling to remove budgets from in-memory store.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus on verifying the atomic UPDATE operation correctly avoids the read-then-write deadlock scenario
  • Confirm the newUsage calculation logic properly incorporates baseline values
  • Validate RowsAffected deletion handling edge cases (e.g., concurrent deletions, race conditions with in-memory removal)

Poem

🐰 A deadlock was born in the read-write dance,
But atomicity came to save the circumstance!
With UPDATE so swift and RowsAffected's keen sight,
The budgets now flow without lock-holding fright. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is vague and generic, using a non-descriptive term 'deadlock fix' that lacks specificity about which component or what the actual change involves. Improve the title to be more specific, such as 'Fix budget persistence deadlocks by using direct UPDATE statements' to clearly convey the main change.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering all major template sections including summary, changes, type, affected areas, testing instructions, and checklist items.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 12-11-deadlock_fix

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

akshaydeo commented Dec 11, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@akshaydeo akshaydeo mentioned this pull request Dec 11, 2025
18 tasks
@akshaydeo akshaydeo marked this pull request as ready for review December 11, 2025 06:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 687d705 and 631ac34.

📒 Files selected for processing (1)
  • plugins/governance/store.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • plugins/governance/store.go
🧠 Learnings (1)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.

Applied to files:

  • plugins/governance/store.go
🧬 Code graph analysis (1)
plugins/governance/store.go (2)
core/schemas/models.go (1)
  • Model (109-129)
framework/configstore/tables/budget.go (2)
  • TableBudget (11-20)
  • TableBudget (23-23)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check
  • GitHub Check: Graphite / mergeability_check

Comment on lines +547 to 569
// Update each budget atomically using direct UPDATE to avoid deadlocks
// (SELECT + Save pattern causes deadlocks when multiple instances run concurrently)
for _, inMemoryBudget := range budgets {
// Check if budget exists in database
var budget configstoreTables.TableBudget
if err := tx.WithContext(ctx).First(&budget, "id = ?", inMemoryBudget.ID).Error; err != nil {
// If budget not found then it must be deleted, so we remove it from the in-memory store
if errors.Is(err, gorm.ErrRecordNotFound) {
budgetsToDelete = append(budgetsToDelete, inMemoryBudget.ID)
continue
}
return fmt.Errorf("failed to get budget %s: %w", inMemoryBudget.ID, err)
// Calculate the new usage value
newUsage := inMemoryBudget.CurrentUsage
if baseline, exists := baselines[inMemoryBudget.ID]; exists {
newUsage += baseline
}

// Update usage
if baseline, exists := baselines[inMemoryBudget.ID]; exists {
budget.CurrentUsage = inMemoryBudget.CurrentUsage + baseline
} else {
budget.CurrentUsage = inMemoryBudget.CurrentUsage
// Direct UPDATE avoids read-then-write lock escalation that causes deadlocks
result := tx.WithContext(ctx).
Model(&configstoreTables.TableBudget{}).
Where("id = ?", inMemoryBudget.ID).
Update("current_usage", newUsage)

if result.Error != nil {
return fmt.Errorf("failed to update budget %s: %w", inMemoryBudget.ID, result.Error)
}
if err := tx.WithContext(ctx).Save(&budget).Error; err != nil {
return fmt.Errorf("failed to save budget %s: %w", inMemoryBudget.ID, err)

// If no rows affected, budget was deleted from database
if result.RowsAffected == 0 {
budgetsToDelete = append(budgetsToDelete, inMemoryBudget.ID)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not treat RowsAffected == 0 as "budget deleted" – this will silently drop active budgets.

The new RowsAffected == 0 check is unsafe: on SQLite (Bifrost's default) and MySQL, an UPDATE that sets a column to its existing value reports RowsAffected == 0 even when the row still exists. In this function that means:

  • Any budget whose current_usage hasn't changed since the last flush will be interpreted as "deleted from database".
  • Its ID is added to budgetsToDelete, and then removed from gs.budgets via the deletion loop below this block.
  • Subsequent budget checks that rely on collectBudgetsFromHierarchy / gs.budgets will silently stop enforcing those budgets.

That's a correctness bug: live budgets disappear from governance enforcement just because their usage was stable between dumps.

Given the deadlock fix goal, a safe minimal change is to keep the direct UPDATE but stop mutating in‑memory state based solely on RowsAffected. You can still log this condition for observability and add a more dialect-aware existence check later if needed.

Concrete suggestion for this block:

-        // Direct UPDATE avoids read-then-write lock escalation that causes deadlocks
+        // Direct UPDATE avoids read-then-write lock escalation that causes deadlocks
         result := tx.WithContext(ctx).
             Model(&configstoreTables.TableBudget{}).
             Where("id = ?", inMemoryBudget.ID).
             Update("current_usage", newUsage)

         if result.Error != nil {
             return fmt.Errorf("failed to update budget %s: %w", inMemoryBudget.ID, result.Error)
         }

-        // If no rows affected, budget was deleted from database
-        if result.RowsAffected == 0 {
-            budgetsToDelete = append(budgetsToDelete, inMemoryBudget.ID)
-        }
+        // NOTE: RowsAffected == 0 is ambiguous across drivers (e.g. SQLite/MySQL when the
+        // value doesn't actually change). Don't treat this as "budget deleted" to avoid
+        // silently dropping valid budgets from the in-memory store. If we need automatic
+        // cleanup of deleted budgets, we should add a dialect-aware existence check instead.

With this change, budgetsToDelete will remain empty and the cleanup loop below becomes a no-op, preserving existing in‑memory budgets behavior while still fixing the deadlock by using a direct UPDATE.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Update each budget atomically using direct UPDATE to avoid deadlocks
// (SELECT + Save pattern causes deadlocks when multiple instances run concurrently)
for _, inMemoryBudget := range budgets {
// Check if budget exists in database
var budget configstoreTables.TableBudget
if err := tx.WithContext(ctx).First(&budget, "id = ?", inMemoryBudget.ID).Error; err != nil {
// If budget not found then it must be deleted, so we remove it from the in-memory store
if errors.Is(err, gorm.ErrRecordNotFound) {
budgetsToDelete = append(budgetsToDelete, inMemoryBudget.ID)
continue
}
return fmt.Errorf("failed to get budget %s: %w", inMemoryBudget.ID, err)
// Calculate the new usage value
newUsage := inMemoryBudget.CurrentUsage
if baseline, exists := baselines[inMemoryBudget.ID]; exists {
newUsage += baseline
}
// Update usage
if baseline, exists := baselines[inMemoryBudget.ID]; exists {
budget.CurrentUsage = inMemoryBudget.CurrentUsage + baseline
} else {
budget.CurrentUsage = inMemoryBudget.CurrentUsage
// Direct UPDATE avoids read-then-write lock escalation that causes deadlocks
result := tx.WithContext(ctx).
Model(&configstoreTables.TableBudget{}).
Where("id = ?", inMemoryBudget.ID).
Update("current_usage", newUsage)
if result.Error != nil {
return fmt.Errorf("failed to update budget %s: %w", inMemoryBudget.ID, result.Error)
}
if err := tx.WithContext(ctx).Save(&budget).Error; err != nil {
return fmt.Errorf("failed to save budget %s: %w", inMemoryBudget.ID, err)
// If no rows affected, budget was deleted from database
if result.RowsAffected == 0 {
budgetsToDelete = append(budgetsToDelete, inMemoryBudget.ID)
}
// Update each budget atomically using direct UPDATE to avoid deadlocks
// (SELECT + Save pattern causes deadlocks when multiple instances run concurrently)
for _, inMemoryBudget := range budgets {
// Calculate the new usage value
newUsage := inMemoryBudget.CurrentUsage
if baseline, exists := baselines[inMemoryBudget.ID]; exists {
newUsage += baseline
}
// Direct UPDATE avoids read-then-write lock escalation that causes deadlocks
result := tx.WithContext(ctx).
Model(&configstoreTables.TableBudget{}).
Where("id = ?", inMemoryBudget.ID).
Update("current_usage", newUsage)
if result.Error != nil {
return fmt.Errorf("failed to update budget %s: %w", inMemoryBudget.ID, result.Error)
}
// NOTE: RowsAffected == 0 is ambiguous across drivers (e.g. SQLite/MySQL when the
// value doesn't actually change). Don't treat this as "budget deleted" to avoid
// silently dropping valid budgets from the in-memory store. If we need automatic
// cleanup of deleted budgets, we should add a dialect-aware existence check instead.
🤖 Prompt for AI Agents
In plugins/governance/store.go around lines 547 to 569, the code treats
result.RowsAffected == 0 as "budget deleted", which is incorrect on SQLite/MySQL
where UPDATE that doesn't change values yields 0 affected rows; remove the logic
that appends IDs to budgetsToDelete based solely on RowsAffected, and instead
only log the zero-rows-affected case for observability (or perform a
dialect-aware existence check if needed later), so the direct UPDATE remains to
avoid deadlocks but in-memory budgets are not removed when usage was unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants