-
-
Notifications
You must be signed in to change notification settings - Fork 836
fix(sdk): always pass deterministic session ID to prevent orphaned files #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixes #514 - Excessive observer sessions created during startup-recovery Root cause: When memorySessionId was null or equaled contentSessionId (placeholder), no `resume` parameter was passed to the SDK's query(). This caused the SDK to create a NEW session file (.jsonl) on every call. If queries aborted before capturing the SDK's session_id, the placeholder remained, leading to cascading creation of 13,000+ orphaned files. Fix: - Generate deterministic ID `mem-${contentSessionId}` upfront - Always pass it to `resume` parameter - Persist immediately to database before query starts - If SDK returns different ID, capture and use that going forward This ensures one session file per contentSessionId, eliminating orphans. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: PR #516 - Fix Orphaned Session FilesSummaryThis PR addresses issue #514 by fixing excessive observer session creation during startup-recovery. The solution generates a deterministic memory session ID upfront and always passes it to the SDK's ✅ Strengths1. Root Cause AnalysisThe PR correctly identifies that the bug occurred when 2. Deterministic ID GenerationThe new approach using
3. Immediate PersistenceThe fix persists the memory session ID to the database before starting the SDK query (lines 77-80), which is critical for:
4. Fallback HandlingThe code gracefully handles the case where the SDK returns a different session ID than requested (lines 106-118), logging a warning and updating to use the SDK's ID. This defensive programming is excellent. 5. Code Quality
🔍 Potential Issues & Concerns1. CRITICAL: Potential UNIQUE Constraint ViolationLocation: The database schema (from SessionStore.ts:79) defines: memory_session_id TEXT UNIQUEThe Problem: If another session already exists with
Evidence: Looking at SessionStore.ts:1195: INSERT OR IGNORE INTO sdk_sessions
(content_session_id, memory_session_id, project, user_prompt, started_at, started_at_epoch, status)
VALUES (?, ?, ?, ?, ?, ?, 'active')
`).run(contentSessionId, contentSessionId, project, userPrompt, now.toISOString(), nowEpoch);The second parameter sets Suggested Fix: // Generate deterministic memory session ID if not already set
const memorySessionId = (session.memorySessionId && session.memorySessionId !== session.contentSessionId)
? session.memorySessionId
: `mem-${session.contentSessionId}`;
// Persist immediately if we just generated it
if (!session.memorySessionId || session.memorySessionId === session.contentSessionId) {
session.memorySessionId = memorySessionId;
// Use try-catch to handle potential UNIQUE constraint violations
try {
this.dbManager.getSessionStore().updateMemorySessionId(session.sessionDbId, memorySessionId);
} catch (error) {
logger.warn('SDK', 'Failed to update memory session ID, possibly due to constraint violation', {
sessionDbId: session.sessionDbId,
memorySessionId,
error: String(error)
});
// Fallback: if update fails, let the SDK generate its own ID naturally
}
}2. Missing Test CoverageIssue: There are no tests validating this new behavior, specifically:
Recommendation: Add tests to describe('Deterministic Memory Session ID Generation (Issue #514)', () => {
it('should generate deterministic mem- prefixed ID on first run', () => {
const contentSessionId = 'test-content-session';
const sessionDbId = store.createSDKSession(contentSessionId, 'test-project', 'Test');
// Simulate SDKAgent generating deterministic ID
const deterministicId = `mem-${contentSessionId}`;
store.updateMemorySessionId(sessionDbId, deterministicId);
const session = store.getSessionById(sessionDbId);
expect(session?.memory_session_id).toBe(deterministicId);
});
it('should not update if memory_session_id already differs from content_session_id', () => {
const contentSessionId = 'existing-session';
const existingMemoryId = 'sdk-captured-id-abc';
const sessionDbId = store.createSDKSession(contentSessionId, 'test-project', 'Test');
store.updateMemorySessionId(sessionDbId, existingMemoryId);
// Verify it doesn't get overwritten by deterministic generation
const session = store.getSessionById(sessionDbId);
expect(session?.memory_session_id).toBe(existingMemoryId);
expect(session?.memory_session_id).not.toBe(`mem-${contentSessionId}`);
});
});3. Documentation GapIssue: The comment on line 187 in SessionStore.ts references the old conditional resume logic that was removed: // NOTE: memory_session_id is initialized to contentSessionId as a placeholder for FK purposes.
// The REAL memory session ID is captured by SDKAgent from the first SDK response
// and stored via updateMemorySessionId(). The resume logic checks if memorySessionId
// differs from contentSessionId before using it - see SDKAgent.startSession().This comment is now outdated. The new behavior generates a deterministic ID upfront, not from the SDK response. Recommendation: Update the comment to reflect the new behavior: // NOTE: memory_session_id is initialized to contentSessionId as a placeholder.
// SDKAgent generates a deterministic ID (mem-${contentSessionId}) before starting
// the query to prevent orphaned session files (Issue #514). If the SDK returns
// a different ID, SDKAgent will update to use that instead.4. Edge Case: Concurrent UpdatesScenario: If multiple workers or threads try to update the same session's Current Risk: Low, because typically only one SDKAgent processes a given session at a time, but worth documenting or adding a database transaction if needed. 5. Logging VerbosityMinor Issue: The warning log on line 107 will fire even in normal operation if the SDK chooses to generate its own ID instead of using the deterministic one. This might create noise in logs. Recommendation: Consider downgrading to logger.info('SDK', 'SDK returned different session ID, updating to use SDK-provided ID', {
sessionDbId: session.sessionDbId,
requested: session.memorySessionId,
received: message.session_id
});🔒 Security ConsiderationsNo security issues identified. The deterministic ID generation doesn't expose sensitive information and maintains proper database constraints. 🚀 Performance ConsiderationsPositive Impact:
Neutral:
✅ Test Plan VerificationThe PR's test plan is good but could be strengthened:
📋 RecommendationsHigh Priority
Medium Priority
Low Priority
🎯 VerdictApproval Status: ✅ APPROVE with recommended fixes This PR successfully addresses the critical bug that was causing thousands of orphaned session files. The core solution is sound and well-implemented. The concerns raised (especially the UNIQUE constraint issue) should be addressed before merging, but the overall approach is excellent. The fix demonstrates good understanding of the SDK lifecycle and implements a clever solution to ensure session resumption works correctly even across worker restarts. Impact:
Great work on diagnosing and fixing this complex issue! 🎉 Review completed by Claude Code Assistant on 2026-01-01 |
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
PR Review: Fix for Orphaned Session Files (Issue #514)SummaryThis PR successfully addresses the critical bug where 13,000+ orphaned .jsonl files were created during startup-recovery. The fix implements deterministic session ID generation and ensures the Code Quality: ✅ ExcellentStrengths
Root Cause AnalysisThe PR correctly identifies and fixes the core issue:
This prevents the SDK from creating new session files on every call. Code Review by Section1. Session ID Generation (Lines 69-80)const memorySessionId = (session.memorySessionId && session.memorySessionId !== session.contentSessionId)
? session.memorySessionId
: `mem-${session.contentSessionId}`;✅ Correct Logic:
Potential Edge Case (minor): 2. Resume Parameter (Lines 90-100)const queryResult = query({
prompt: messageGenerator,
options: {
model: modelId,
resume: memorySessionId, // Always pass - prevents orphaned sessions
disallowedTools,
// ...
}
});✅ Critical Fix: Always passing 3. SDK Session ID Validation (Lines 104-118)if (message.session_id && message.session_id !== session.memorySessionId) {
logger.warn('SDK', 'SDK returned different session ID than requested', {
sessionDbId: session.sessionDbId,
requested: session.memorySessionId,
received: message.session_id
});
session.memorySessionId = message.session_id;
this.dbManager.getSessionStore().updateMemorySessionId(
session.sessionDbId,
message.session_id
);
}✅ Excellent defensive programming:
Potential Issues & Edge Cases1. Session ID Format Validation
|
- Removed deterministic memory session ID generation logic. - Updated query options to use contentSessionId for resume parameter. - Eliminated checks for SDK returning different session IDs. - Streamlined logging and session ID updates for clarity and efficiency.
Code ReviewSummaryThis PR simplifies session ID management in the SDKAgent by using ✅ Positive Aspects
|
Summary
Fixes #514 - Excessive observer sessions created during startup-recovery (13,000+ orphaned .jsonl files)
Root Cause
When
memorySessionIdwas null or equaledcontentSessionId(placeholder), noresumeparameter was passed to the SDK'squery(). This caused the SDK to create a new session file on every call. If queries aborted before capturing the SDK's session_id, the placeholder remained, leading to cascading creation of orphaned files.Changes
mem-${contentSessionId}upfront instead of waiting to capture from SDKresumeparameter (no more conditional spreading)Files Changed
src/services/worker/SDKAgent.ts- Session ID generation and resume logicTest plan
npm run buildgrep -n "hasRealMemorySessionId" src/services/worker/SDKAgent.tsreturns nothinggrep -n 'mem-\${session.contentSessionId}' src/services/worker/SDKAgent.tsreturns line 74grep -n "resume:" src/services/worker/SDKAgent.tsshows unconditional usage🤖 Generated with Claude Code