feat: concurrent export, pagination bugfix, streaming writes, legacy import fix#1
Open
MarioAlessandroNapoli wants to merge 4 commits intofarouk09:mainfrom
Conversation
Replace the current create+update_state import (which loses checkpoint history) with supersteps-based import that preserves time-travel. Two conversion strategies: - Direct: uses metadata.writes when available (local checkpointer) - Delta: computes state diffs between consecutive checkpoints (Cloud API) Delta approach skips middleware no-ops, only generating supersteps for state-changing steps (e.g. 100 checkpoints → 30 supersteps). Includes automatic fallback to legacy import if the target API doesn't support supersteps (400/422 → switches for all remaining threads). Validation now optionally compares checkpoint history counts for a sample thread (enabled with --test-single in full migration).
… (AE-260) Add --source-api-key and --target-api-key CLI flags with LANGSMITH_SOURCE_API_KEY and LANGSMITH_TARGET_API_KEY env vars. Backward compatible: LANGSMITH_API_KEY used as fallback for both.
…ing JSON (AE-261, AE-262) - Add --metadata-filter for server-side JSONB containment filtering - Add --history-limit for optional checkpoint cap per thread - Retry API calls with exponential backoff + jitter (3 attempts) - Paginate checkpoint history via `before` cursor (no more limit=100) - Rewrite JSONExporter for streaming writes (no in-memory buffering) - Update README with new flags, examples, and feature docs
…import fix
Critical bugfix:
- Fix history pagination cursor format: server expects
{"configurable": {"checkpoint_id": ...}}, not flat {"checkpoint_id": ...}.
Without this fix, every page after the first returns a 500 error,
silently truncating exports to ~100 checkpoints per thread.
Performance:
- Concurrent thread fetching with asyncio.Semaphore (configurable --concurrency, default 5)
- Streaming JSON export via producer-consumer queue (constant memory usage)
- Per-page retry with exponential backoff + jitter (instead of retrying entire history)
New features:
- --legacy-terminal-node: specify graph terminal node for legacy imports,
ensuring next=[] so threads are continuable after migration
- --concurrency N: control parallel thread fetches
- Rich progress bars with per-thread detail and elapsed time
Fixes:
- JSON loader uses strict=False to handle control characters in agent messages
- Removed unused imports and dead code
- Updated README with complete command reference and troubleshooting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses several critical issues discovered during a production migration of 165 threads (12,904 checkpoints, 2.1 GB) between LangGraph Cloud deployments.
Critical bugfix: history pagination cursor format
The
beforecursor forget_history()pagination must use the format{"configurable": {"checkpoint_id": "..."}}, not the flat{"checkpoint_id": "..."}. Without this fix, every page after the first returns a server-sideKeyError: 'configurable'(500 error), silently truncating exports to ~100 checkpoints per thread. This is the root cause of slow/incomplete exports on any thread with significant history.Performance improvements
asyncio.Semaphore(configurable--concurrency, default 5) — reduced export time from >1 hour to ~15 minutes for 165 threadsasyncio.Queue— constant memory usage instead of buffering all data (critical for multi-GB exports)New features
--legacy-terminal-node NODE— When supersteps import fails (e.g., due to incompatible serialized objects in old checkpoints) and the tool falls back tocreate_thread()+update_thread_state(), the imported threads may havenext=['SomeNode']instead ofnext=[], making them non-continuable. This flag passesas_nodetoupdate_thread_state()to correctly set the terminal state. This is a LangGraph behavior (not a bug) —update_statewithoutas_nodedefaults to the graph's entry node for routing.--concurrency N— Control parallel thread fetchesOther fixes
json.loads(..., strict=False)in JSON loader to handle control characters in agent messagestime,Live,Layout,TaskProgressColumn)Changes
client.pymigrator.py--legacy-terminal-nodesupportmigrate_threads.pyjson_exporter.pystrict=Falsefor JSON loadingREADME.mdTest plan
--legacy-terminal-nodefixesnext=[]on legacy imports--test-single,--dry-run,--metadata-filterflagsstrict=False(control characters in agent messages)