-
Notifications
You must be signed in to change notification settings - Fork 88
fix: trigger bucket failover immediately on 429/402/401 errors (#1006) #1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bucket failover was only triggering after 2 consecutive 429s and never for 402/401 errors, causing requests to fail instead of failing over to the next bucket. Changes: - Reduced failoverThreshold from 2 to 1 for immediate failover on first 429 - Added 402 to the failover trigger conditions - Added 401 handling: retry once before attempting bucket failover - Moved failover logic before the throw check to ensure it runs even when maxAttempts is low Respects the expected behavior: - On 429/402: immediately advance to next bucket - On 401: retry once after token refresh, then failover closes #1006
WalkthroughAdds 401 and 402 handling to retry logic with a one-time refresh-before-failover for 401, generalizes and lowers failover gating, resets counters/delay after failover; expands tests for 401/402/429 and bucket exhaustion; centralizes sandbox env passthrough and adds welcome-config setup/cleanup in integration tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RetryLogic as "Retry Logic"
participant Server
participant Bucket as "Bucket/Auth"
Client->>RetryLogic: performOperation()
RetryLogic->>Server: request(with current bucket token)
Server-->>RetryLogic: response (200 / 401 / 402 / 429 / 5xx)
alt 401 and refresh allowed
Note right of RetryLogic `#cfe8ff`: detect 401 — allow one refresh retry
RetryLogic->>Bucket: request token refresh
Bucket-->>RetryLogic: refreshed credentials
RetryLogic->>Server: retry(with refreshed token)
Server-->>RetryLogic: response
else 402 or persistent 429s or refresh failed
Note right of RetryLogic `#fde8e8`: trigger failover — call onPersistent429
RetryLogic->>Bucket: invoke onPersistent429 (get new auth or false)
Bucket-->>RetryLogic: new bucket/auth or exhaustion
RetryLogic->>Server: retry(with new bucket token)
Server-->>RetryLogic: response
end
alt Success
RetryLogic-->>Client: return result
else Exhausted or final error
RetryLogic-->>Client: throw final error (preserve original condition)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I'll review this PR that addresses bucket failover issues for OAuth profiles. Let me start by organizing my review approach. Let me see the new tests that were added. I can see the file has been truncated. Let me read the new tests: LLxprt PR Review – PR #1009Issue Alignment Side Effects
Code Quality
Tests & Coverage
Verdict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/core/src/utils/retry.ts (1)
337-344: Clarify refresh behavior on 401.The
shouldAttemptRefreshRetryflag allows one retry before failover on 401, but the actual token refresh appears to happen only whenonPersistent429is called during failover. The debug log says "retrying once before bucket failover," which implies the retry itself may trigger token refresh through the normal request flow (if the OAuth layer auto-refreshes on retry).This design relies on the caller's
onPersistent429implementation to handle token refresh. Consider adding a comment clarifying that the "refresh retry" assumes the OAuth layer handles token refresh automatically on subsequent requests, or thatonPersistent429should perform the refresh.packages/core/src/utils/retry.test.ts (1)
728-754: Consider asserting exact call count for clarity.The test correctly validates that an error is thrown when no more buckets are available. However, it only asserts
failoverCallback.toHaveBeenCalled()without specifying the expected count.Given
failoverThreshold = 1, the flow should be:
- First call → 429 → immediate failover attempt → returns false → throws
Consider adding an explicit assertion for
mockFncall count to document the expected behavior:🔎 Suggested assertion
expect(result.message).toBe('Rate limit exceeded'); expect(failoverCallback).toHaveBeenCalled(); + expect(failoverCallback).toHaveBeenCalledTimes(1); + expect(mockFn).toHaveBeenCalledTimes(1); });
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/core/src/utils/retry.test.tspackages/core/src/utils/retry.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
packages/core/src/utils/retry.test.tspackages/core/src/utils/retry.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-13T16:04:07.079Z
Learning: Support multi-bucket failover on a single profile: ordered buckets tried in sequence; on 429 or 402 advance to next bucket; on 401 attempt auto-renew then retry once before advancing.
📚 Learning: 2025-12-13T16:04:07.079Z
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-13T16:04:07.079Z
Learning: Support multi-bucket failover on a single profile: ordered buckets tried in sequence; on 429 or 402 advance to next bucket; on 401 attempt auto-renew then retry once before advancing.
Applied to files:
packages/core/src/utils/retry.test.tspackages/core/src/utils/retry.ts
🧬 Code graph analysis (2)
packages/core/src/utils/retry.test.ts (1)
packages/core/src/utils/retry.ts (2)
HttpError(12-14)retryWithBackoff(263-444)
packages/core/src/utils/retry.ts (2)
packages/core/src/providers/openai-responses/OpenAIResponsesProvider.ts (1)
shouldRetryOnError(136-150)packages/core/src/providers/gemini/GeminiProvider.ts (1)
shouldRetryOnError(239-263)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: E2E Test (macOS)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: Slow E2E - Win
🔇 Additional comments (9)
packages/core/src/utils/retry.ts (6)
290-291: LGTM! Initialization aligns with PR objectives.The
consecutive401scounter andfailoverThreshold = 1correctly implement the requirement for immediate bucket failover on first 429/402 error. Based on learnings, this matches the expected behavior: "on 429 or 402 advance to next bucket."
316-318: LGTM!Clean extraction of status codes for use in failover and refresh logic.
331-335: LGTM!Correctly tracks consecutive 401 errors, mirroring the existing 429 tracking pattern.
346-351: LGTM! Failover conditions correctly implement the requirements.The logic properly differentiates:
- 429/402: Immediate failover on first occurrence
- 401: Failover only after the first retry fails (consecutive401s > 1)
This aligns with the learning that 429/402 should advance immediately while 401 should attempt auto-renew first.
355-386: LGTM! Failover execution properly resets state.Good implementation:
- Dynamic failover reason in debug logs improves observability
- Both
consecutive429sandconsecutive401sreset after successful bucket switch- Attempt counter decremented for a fresh start with the new bucket
388-400: Code correctly handles 401 errors with intentional refresh retry path.The default retry predicate returns
trueonly for 429 and 5xx errors, plus transient network errors. Since 401 is not retryable by default, it requiresshouldAttemptRefreshRetryto bypass the immediate throw on line 390-392. Theattempt--on line 399 allows this refresh attempt without consuming againstmaxAttempts, ensuring 401 gets exactly one retry through the refresh token mechanism before failover triggers.packages/core/src/utils/retry.test.ts (3)
608-642: LGTM! Test correctly validates immediate 429 failover.The test properly verifies that:
- Failover triggers on the first 429 error (not waiting for multiple consecutive errors)
- The operation succeeds after bucket switch
mockFnis called twice (initial fail + retry with new bucket)Good use of
maxAttempts: 1to verify failover doesn't consume the attempt budget.
644-681: LGTM! Test correctly validates immediate 402 failover.Good test coverage for the 402 (Payment Required) path:
- Verifies immediate failover without retry
- Asserts callback receives correct
authTypeand error- Confirms operation succeeds after bucket switch
683-719: LGTM! Test correctly validates 401 refresh-retry-before-failover behavior.The test properly verifies the 401 flow:
- First 401 → refresh retry (no failover yet)
- Second 401 → triggers bucket failover
- Third call → succeeds with new bucket
This matches the learning: "on 401 attempt auto-renew then retry once before advancing."
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-24.x-ubuntu-latest' artifact from the main CI run. |
- Add clarifying comment for 401 refresh retry behavior explaining OAuth refresh - Update debug message for clarity - Add exact call count assertions in 'no more buckets' test
The welcome onboarding flow now runs on first launch and blocks interactive input, causing the 'Type your message' prompt to never appear. This test waits for that prompt and times out. Fix by pointing LLXPRT_CODE_WELCOME_CONFIG_PATH to a test-specific config file with welcomeCompleted=true, which only affects this test without breaking the onboarding feature for real users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @integration-tests/mcp_server_cyclic_schema.test.ts:
- Around line 199-201: The afterAll cleanup currently only deletes the
LLXPRT_CODE_WELCOME_CONFIG_PATH env var; extend the afterAll hook to also remove
the welcome-config.json file created by the test by checking for its existence
and deleting it (use fs.existsSync + fs.unlinkSync or fs.rmSync) with the same
path value (process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH or the concrete path
used when creating the file) to ensure no test artifacts remain on disk; keep
the env var deletion and guard file deletion in a try/catch to avoid failing
teardown if the file is already gone.
🧹 Nitpick comments (1)
integration-tests/mcp_server_cyclic_schema.test.ts (1)
178-186: Clarify the relationship between welcome config setup and PR objectives.The welcome config setup appears unrelated to the PR's main objectives (fixing 429/402/401 retry and bucket failover logic). This test focuses on MCP cyclic schemas, not authentication or rate limiting.
Could you clarify why this environment setup was necessary for this PR? Was there an issue with welcome prompts interfering with the test after the retry logic changes?
Minor note: The non-null assertion
rig.testDir!assumestestDiris always set aftersetup(), which should be safe but could fail silently if setup doesn't complete successfully.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
integration-tests/mcp_server_cyclic_schema.test.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
integration-tests/mcp_server_cyclic_schema.test.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-13T16:04:07.079Z
Learning: Support multi-bucket failover on a single profile: ordered buckets tried in sequence; on 429 or 402 advance to next bucket; on 401 attempt auto-renew then retry once before advancing.
📚 Learning: 2025-11-25T16:56:18.980Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T16:56:18.980Z
Learning: Before reporting a task as finished, run `node scripts/start.js --profile-load synthetic --prompt "write me a haiku"` from the repository root and ensure it succeeds (exit code 0)
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (macOS)
- GitHub Check: E2E Test (Linux) - sandbox:none
- GitHub Check: Slow E2E - Win
🔇 Additional comments (1)
integration-tests/mcp_server_cyclic_schema.test.ts (1)
20-20: LGTM!The
afterAllimport is correctly added to support test cleanup.
Clear the welcome config cache with resetWelcomeConfigForTesting() at the start of beforeAll to ensure the config is loaded fresh for this test. Clean up the test welcome-config.json file in afterAll to avoid leaving test artifacts on disk. This fixes the test failure where the welcome onboarding was still running despite setting LLXPRT_CODE_WELCOME_CONFIG_PATH, due to the module-level cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integration-tests/mcp_server_cyclic_schema.test.ts (1)
202-208: Add defensive error handling to file cleanup.The cleanup removes the welcome config file, addressing the past review concern about test artifacts. However,
unlinkSyncwill throw if the file doesn't exist or can't be deleted, causing test teardown to fail.Wrap the file deletion in a try-catch block to ensure cleanup completes even if the file is missing.
🔎 Proposed fix with defensive error handling
afterAll(() => { if (process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH) { - unlinkSync(process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH); + try { + unlinkSync(process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH); + } catch { + // File may not exist or already cleaned up + } } delete process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH; });Based on the previous review comment suggestion.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
integration-tests/mcp_server_cyclic_schema.test.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
integration-tests/mcp_server_cyclic_schema.test.ts
🧠 Learnings (4)
📓 Common learnings
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-13T16:04:07.079Z
Learning: Support multi-bucket failover on a single profile: ordered buckets tried in sequence; on 429 or 402 advance to next bucket; on 401 attempt auto-renew then retry once before advancing.
📚 Learning: 2025-11-25T16:56:18.980Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T16:56:18.980Z
Learning: Before reporting a task as finished, run `npm run test` from the repository root and ensure it succeeds (exit code 0)
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
📚 Learning: 2025-11-25T16:56:18.980Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T16:56:18.980Z
Learning: Before reporting a task as finished, run `node scripts/start.js --profile-load synthetic --prompt "write me a haiku"` from the repository root and ensure it succeeds (exit code 0)
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
📚 Learning: 2025-12-05T14:59:34.368Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-05T14:59:34.368Z
Learning: Always run `npm run lint` from the main llxprt-code directory before considering work complete
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
🧬 Code graph analysis (1)
integration-tests/mcp_server_cyclic_schema.test.ts (1)
packages/cli/src/config/welcomeConfig.ts (1)
resetWelcomeConfigForTesting(28-30)
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: E2E Test (macOS)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: Slow E2E - Win
- GitHub Check: E2E Test (Linux) - sandbox:none
🔇 Additional comments (2)
integration-tests/mcp_server_cyclic_schema.test.ts (2)
18-21: LGTM! Clean import additions for test lifecycle.The imports support proper test setup/teardown:
unlinkSyncfor file cleanup, explicit Vitest lifecycle hooks following modern conventions, andresetWelcomeConfigForTestingfor clearing cached config state.
167-177: LGTM! Proper test isolation with welcome config setup.The setup correctly resets cached state, configures a test-specific welcome config path, and writes the config file before initializing the test rig. This ensures tests run with a predictable environment.
The welcome config file must be created AFTER rig.setup() so that rig.testDir is initialized. Previously, the config was created before setup when testDir was still null, causing the spawned CLI to fall back to the default welcome config path. Also removed resetWelcomeConfigForTesting() since it only clears the cache in the test process, not the spawned CLI process. This ensures the CLI under test reads the correct welcome config and skips onboarding.
The sandbox was not forwarding LLXPRT_CODE_WELCOME_CONFIG_PATH to the spawned CLI process, causing it to fall back to the default welcome config path and triggering onboarding instead of showing the prompt. Added LLXPRT_CODE_WELCOME_CONFIG_PATH to the curated list of environment variables that are forwarded to: - Docker/podman containers via --env flags - macOS sandbox-exec processes via spawn(..., env) Also added defensive error handling in test cleanup for the welcome config file removal. This allows tests to control welcome onboarding by setting a custom config path with welcomeCompleted: true. closes #1006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/cli/src/utils/sandbox.ts (1)
334-340: Consider returning key-value pairs instead of parsing CLI args back.The function
buildSandboxEnvArgsproduces CLI-formatted args (--env,K=V), which are then parsed back into key-value pairs here. This is an indirect approach.🔎 Suggested refactor: separate concerns
Create a helper that returns the raw key-value pairs, then format as CLI args where needed:
// Returns { key: value } pairs for passthrough env vars function getPassthroughEnvVars(env: NodeJS.ProcessEnv): Record<string, string> { const passthroughVariables = [ 'LLXPRT_CODE_IDE_SERVER_PORT', 'LLXPRT_CODE_IDE_WORKSPACE_PATH', 'LLXPRT_CODE_WELCOME_CONFIG_PATH', 'TERM_PROGRAM', ]; const result: Record<string, string> = {}; for (const envVar of passthroughVariables) { const value = env[envVar]; if (value) { result[envVar] = value; } } return result; } export function buildSandboxEnvArgs(env: NodeJS.ProcessEnv): string[] { const args: string[] = []; for (const [key, value] of Object.entries(getPassthroughEnvVars(env))) { args.push('--env', `${key}=${value}`); } return args; }Then use
Object.assign(sandboxEnv, getPassthroughEnvVars(process.env))instead of the parsing loop.integration-tests/mcp_server_cyclic_schema.test.ts (1)
178-186: Welcome config setup correctly isolates test from interactive prompts.The setup creates a welcome config file to skip the welcome wizard during test execution, which is appropriate for integration tests. The ordering after
rig.setup()ensurestestDiris available.The non-null assertion
rig.testDir!assumessetup()succeeded. While this should always be true in normal test execution, a defensive check could improve error clarity:🔎 Optional: Add defensive check for testDir
+ if (!rig.testDir) { + throw new Error('Test setup failed: rig.testDir is not initialized'); + } + process.env.LLXPRT_CODE_WELCOME_CONFIG_PATH = join( rig.testDir!, 'welcome-config.json', );
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
integration-tests/mcp_server_cyclic_schema.test.tspackages/cli/src/utils/sandbox.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{ts,tsx}: Don't useany- Always specify proper types. Useunknownif the type is truly unknown and add proper type guards.
Do not useconsole.logorconsole.debug- Use the sophisticated logging system instead. Log files are written to ~/.llxprt/debug/
Fix all linting errors, including warnings aboutanytypes
Files:
integration-tests/mcp_server_cyclic_schema.test.tspackages/cli/src/utils/sandbox.ts
🧠 Learnings (4)
📓 Common learnings
Learnt from: acoliver
Repo: vybestack/llxprt-code PR: 0
File: :0-0
Timestamp: 2025-12-13T16:04:07.079Z
Learning: Support multi-bucket failover on a single profile: ordered buckets tried in sequence; on 429 or 402 advance to next bucket; on 401 attempt auto-renew then retry once before advancing.
📚 Learning: 2025-11-25T16:56:18.980Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T16:56:18.980Z
Learning: Before reporting a task as finished, run `npm run test` from the repository root and ensure it succeeds (exit code 0)
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
📚 Learning: 2025-11-25T16:56:18.980Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-25T16:56:18.980Z
Learning: Before reporting a task as finished, run `node scripts/start.js --profile-load synthetic --prompt "write me a haiku"` from the repository root and ensure it succeeds (exit code 0)
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
📚 Learning: 2025-12-05T14:59:34.368Z
Learnt from: CR
Repo: vybestack/llxprt-code PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-05T14:59:34.368Z
Learning: Always run `npm run lint` from the main llxprt-code directory before considering work complete
Applied to files:
integration-tests/mcp_server_cyclic_schema.test.ts
⏰ Context from checks skipped due to timeout of 270000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test (windows-latest, 24.x)
- GitHub Check: Test (ubuntu-latest, 24.x)
- GitHub Check: Test (macos-latest, 24.x)
- GitHub Check: E2E Test (Linux) - sandbox:docker
- GitHub Check: E2E Test (macOS)
- GitHub Check: Slow E2E - Win
- GitHub Check: E2E Test (Linux) - sandbox:none
🔇 Additional comments (6)
packages/cli/src/utils/sandbox.ts (4)
56-73: LGTM! Clean utility for centralizing env passthrough logic.The function properly collects and formats environment variables for sandbox container invocation. Good use of explicit typing and clear variable naming.
253-255: LGTM!The relative path resolution using
import.meta.urlis the correct ES module pattern for locating sibling files.
419-422: LGTM!The sandbox process now correctly receives the curated environment variables via
sandboxEnv.
811-812: LGTM!Clean and direct usage of the utility function for CLI arg construction.
integration-tests/mcp_server_cyclic_schema.test.ts (2)
18-20: LGTM! Imports support welcome config lifecycle.The added imports (
unlinkSync,writeFileSync,afterAll) are properly structured and necessary for the welcome config setup and cleanup pattern.
199-209: Excellent cleanup implementation addressing previous feedback.The
afterAllhook now properly removes both the welcome config file and environment variable, with defensive error handling to prevent teardown failures. This addresses the past review concern about leaving test artifacts on disk.
Problem
When a profile with multiple OAuth buckets configured (e.g.,
opusthinkingbucketedwith buckets[email protected]and[email protected]) hit a rate limit (429) or quota error (402), the system was expected to fail over to the next bucket in the list. Instead, it was throwing the error and exiting to the prompt without any failover occurring.Root Cause
The bucket failover logic in
retry.tshad two critical issues:High failover threshold: The
failoverThresholdwas set to 2, meaning bucket failover only triggered after two consecutive 429 errors, not on the first one.Failover check after throw: The failover logic ran after the function checked whether to throw the error. When
maxAttemptswas low (or on first error), the throw would happen before failover was attempted.Missing 402/401 support:
Solution
Modified the retry logic in
packages/core/src/utils/retry.ts:failoverThresholdfrom 2 to 1 - bucket failover now happens on the first 429 errormaxAttemptsis lowBehavior
After this fix, the retry logic now correctly implements the expected bucket failover behavior:
Testing
Added comprehensive test coverage in
retry.test.ts:All tests pass, including:
This fix works for all code paths (main CLI, subagents, all providers) since the retry logic is shared across the codebase.
closes #1006
Summary by CodeRabbit
Bug Fixes
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.