fix: prevent infinite restart loop that causes runaway API costs #693

ajbmachon · 2026-01-12T12:51:25Z

Problem

When memorySessionId is null (due to a race condition during session initialization), the generator enters an infinite restart loop:

Generator starts → Calls LLM API ✅ (tokens charged)
processAgentResponse() → throws "memorySessionId not yet captured" ❌
.finally() sees pendingCount > 0 → Restarts generator after 1 second
Go to step 1 (NO RETRY LIMIT!)

Real-World Impact

I experienced $402 in wasted API costs before my OpenRouter key limit stopped it. The logs showed:

[ERROR] ✗ OpenRouter agent error - Cannot store observations: memorySessionId not yet captured
[INFO ] Restarting generator after crash/exit with pending work {pendingCount=429}

This repeated 24,219 times in a single day. Each restart made another LLM call that succeeded (cost tokens) but then failed to store the result.

Solution

1. Add Restart Limit (Primary Fix)

// SessionRoutes.ts
const MAX_CONSECUTIVE_RESTARTS = 3;

if (session.consecutiveRestarts > MAX_CONSECUTIVE_RESTARTS) {
  logger.error('SESSION', 'CRITICAL: Generator restart limit exceeded - stopping to prevent runaway costs');
  session.abortController.abort();
  return;
}

2. Add Exponential Backoff

// Backoff: 1s → 2s → 4s
const backoffMs = Math.min(1000 * Math.pow(2, session.consecutiveRestarts - 1), 8000);

3. Fail Fast - Check Before LLM Call (Defensive)

// OpenRouterAgent.ts and GeminiAgent.ts
if (!session.memorySessionId) {
  throw new Error('Cannot process observations: memorySessionId not yet captured');
}
// THEN call expensive LLM
await this.queryOpenRouterMultiTurn(...);

This check happens before the LLM call, preventing token waste.

Files Changed

File	Change
`worker-types.ts`	Add `consecutiveRestarts: number` to `ActiveSession`
`SessionManager.ts`	Initialize `consecutiveRestarts: 0`
`SessionRoutes.ts`	Add restart limit (3), exponential backoff, reset on success
`OpenRouterAgent.ts`	Check `memorySessionId` before LLM calls
`GeminiAgent.ts`	Same defensive check

Testing

Build succeeds: npm run build ✅
Logic verified against original logs showing the infinite loop pattern
Fix preserves legitimate crash recovery for transient failures (3 retries)

Notes

The root cause (why memorySessionId isn't captured in some sessions) may still need investigation
This fix is a safety guardrail to prevent catastrophic costs regardless of root cause
Consider adding alerting/monitoring for when the restart limit is hit

Problem: When memorySessionId is null (race condition during session init), the generator would: 1. Call LLM API successfully (tokens charged) 2. Fail to store result ("memorySessionId not yet captured") 3. Restart after 1 second with NO retry limit 4. Repeat infinitely, accumulating massive API costs One user reported $402 in wasted tokens before their API key limit was hit. The generator had restarted 24,000+ times in a single day. Solution: 1. Add consecutiveRestarts counter to ActiveSession 2. Limit generator restarts to MAX_CONSECUTIVE_RESTARTS (3) 3. Add exponential backoff between restarts (1s, 2s, 4s) 4. Log CRITICAL error when limit exceeded 5. Add defensive checks in OpenRouterAgent and GeminiAgent to verify memorySessionId BEFORE calling expensive LLM APIs This prevents the infinite loop while still allowing legitimate crash recovery for transient failures. Fixes: runaway API costs when memorySessionId not captured

Cherry-picked from PR thedotmack#693: - Add restart limit (max 3 consecutive restarts) - Implement exponential backoff (1s -> 2s -> 4s) - Add defensive validation for memorySessionId before API calls Co-Authored-By: Claude Opus 4.5 <[email protected]>

denjzs · 2026-01-15T16:37:59Z

Related Issue:

PR:

fix: generate memorySessionId for stateless providers #615

thedotmack#693

thedotmack self-assigned this Jan 13, 2026

yrom added a commit to yrom/claude-mem that referenced this pull request Jan 16, 2026

Merge branch pr 'thedotmack#693' into dev

16d846d

thedotmack#693

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: prevent infinite restart loop that causes runaway API costs #693

fix: prevent infinite restart loop that causes runaway API costs #693

ajbmachon commented Jan 12, 2026

Uh oh!

denjzs commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

fix: prevent infinite restart loop that causes runaway API costs #693

Are you sure you want to change the base?

fix: prevent infinite restart loop that causes runaway API costs #693

Conversation

ajbmachon commented Jan 12, 2026

Problem

Real-World Impact

Solution

1. Add Restart Limit (Primary Fix)

2. Add Exponential Backoff

3. Fail Fast - Check Before LLM Call (Defensive)

Files Changed

Testing

Notes

Uh oh!

denjzs commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

denjzs commented Jan 15, 2026 •

edited

Loading