Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# LangSmith API Key (required)
# LangSmith API Key — shared fallback for source and target
# Get your key from: https://smith.langchain.com/ → Settings → API Keys
# Use a Service Key (lsv2_sk_...) for deployment access
LANGSMITH_API_KEY=lsv2_sk_your_api_key_here

# For cross-org migration: separate keys for source and target
# These override LANGSMITH_API_KEY when set
# LANGSMITH_SOURCE_API_KEY=lsv2_sk_source_org_key
# LANGSMITH_TARGET_API_KEY=lsv2_sk_target_org_key

# PostgreSQL Database URL (optional, for --export-postgres)
# Example: postgresql://user:password@localhost:5432/dbname
DATABASE_URL=
219 changes: 111 additions & 108 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,39 @@
# LangGraph Threads Export Tool
# LangGraph Threads Migration Tool

A Python tool to export threads, checkpoints, and conversation history from LangGraph Cloud deployments. Save your data to JSON files, PostgreSQL databases, or migrate directly to another deployment.
A Python tool to export, backup, and migrate threads with full checkpoint history between LangGraph Cloud deployments.

## Why This Tool?

LangGraph Cloud stores your conversation threads and checkpoints, but there's no built-in way to:
- **Backup your data** before deleting a deployment
- **Migrate conversations** between environments (prod → dev)
- **Migrate conversations** between environments (prod → staging, or across orgs)
- **Store threads in your own database** for analytics or compliance
- **Download conversation history** as JSON for processing

This tool solves all of these problems.

## Features

- **Export to JSON** - Download all threads and checkpoints as a backup file
- **Export to PostgreSQL** - Store threads in your own database with proper schema
- **Migrate between deployments** - Transfer threads from one LangGraph Cloud deployment to another
- **Preserve everything** - Thread IDs, metadata (including `owner` for multi-tenancy), checkpoints, and conversation history
- **Test mode** - Export/migrate a single thread first to verify everything works
- **Dry-run mode** - Preview changes without making any modifications
- **Progress tracking** - Real-time progress bars and detailed summaries

## Use Cases

| Scenario | Command |
|----------|---------|
| Backup before deleting deployment | `--export-json backup.json` |
| Cost optimization (expensive → cheaper deployment) | `--full` |
| Store in your own PostgreSQL | `--export-postgres` |
| Environment migration (staging → prod) | `--migrate` |
| Disaster recovery | `--import-json backup.json` |
- **Export to JSON** — Streaming writes, memory-efficient even for multi-GB exports
- **Export to PostgreSQL** — Store threads in your own database with proper schema
- **Migrate between deployments** — Full migration with supersteps (preserves checkpoint chains) and automatic legacy fallback
- **Concurrent fetching** — Configurable parallelism (default: 5 threads) for fast exports
- **Per-page retry** — Automatic retries with exponential backoff on paginated API calls
- **History pagination** — Correct cursor format (`{"configurable": {"checkpoint_id": ...}}`) for complete checkpoint retrieval
- **Legacy import fix** — `--legacy-terminal-node` sets `next=[]` on threads imported via fallback mode
- **Rich progress bars** — Real-time progress with thread counts, elapsed time, and per-thread details
- **Metadata filtering** — Export only threads matching specific metadata (e.g., by workspace)
- **Dry-run & test modes** — Preview changes or test with a single thread before full operations
- **Cross-org support** — Separate API keys for source and target deployments

## Installation

```bash
# Clone the repository
git clone https://github.com/YOUR_USERNAME/langgraph-threads-migration.git
git clone https://github.com/farouk09/langgraph-threads-migration.git
cd langgraph-threads-migration

# Using uv (recommended)
uv venv
source .venv/bin/activate
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

# Or using pip
Expand All @@ -50,18 +42,20 @@ pip install -r requirements.txt

## Configuration

Create a `.env` file:

```bash
cp .env.example .env
```

Edit `.env` with your credentials:

```bash
# Required: LangSmith API key
# Required: LangSmith API key (shared fallback for source and target)
LANGSMITH_API_KEY=lsv2_sk_your_api_key_here

# For cross-org migration: separate keys override the shared key
# LANGSMITH_SOURCE_API_KEY=lsv2_sk_source_org_key
# LANGSMITH_TARGET_API_KEY=lsv2_sk_target_org_key

# Optional: PostgreSQL connection URL (for --export-postgres)
DATABASE_URL=postgresql://user:password@localhost:5432/dbname
```
Expand All @@ -70,9 +64,7 @@ DATABASE_URL=postgresql://user:password@localhost:5432/dbname

## Usage

### Export to JSON file

Download all threads and checkpoints as a backup:
### Export to JSON

```bash
python migrate_threads.py \
Expand All @@ -82,42 +74,60 @@ python migrate_threads.py \

### Export to PostgreSQL

Store threads in your own database:

```bash
python migrate_threads.py \
--source-url https://my-deployment.langgraph.app \
--export-postgres
```

This creates two tables:
- `langgraph_threads` - Thread metadata and current state
- `langgraph_checkpoints` - Full checkpoint history

### Migrate between deployments

Transfer all threads from one deployment to another:
### Full migration (export + import + validate)

```bash
# Same org (shared API key)
python migrate_threads.py \
--source-url https://my-prod.langgraph.app \
--target-url https://my-dev.langgraph.app \
--source-url https://old-deploy.langgraph.app \
--target-url https://new-deploy.langgraph.app \
--full

# Cross-org (separate API keys)
python migrate_threads.py \
--source-url https://org1.langgraph.app \
--target-url https://org2.langgraph.app \
--source-api-key lsv2_sk_source... \
--target-api-key lsv2_sk_target... \
--full
```

### Import from JSON

Restore threads from a backup file:

```bash
python migrate_threads.py \
--target-url https://my-deployment.langgraph.app \
--import-json threads_backup.json
```

### Test with a single thread first
### Import with legacy terminal node fix

When importing threads that fall back to legacy mode (no supersteps), the thread's `next` field may point to the graph's entry node instead of being empty. Use `--legacy-terminal-node` to specify your graph's terminal node, which sets `next=[]` so threads are continuable:

```bash
python migrate_threads.py \
--target-url https://my-deployment.langgraph.app \
--import-json backup.json \
--legacy-terminal-node "MyLastNode.after_handler"
```

Always recommended before a full operation:
### Filter by metadata

```bash
# Export threads for a specific workspace
python migrate_threads.py \
--source-url https://my-deployment.langgraph.app \
--export-json workspace_4.json \
--metadata-filter '{"workspace_id": 4}'
```

### Test with a single thread first

```bash
python migrate_threads.py \
Expand All @@ -132,23 +142,48 @@ python migrate_threads.py \
|----------|-------------|
| `--source-url` | Source LangGraph Cloud deployment URL |
| `--target-url` | Target LangGraph Cloud deployment URL |
| `--api-key` | LangSmith API key (or set in `.env`) |
| `--database-url` | PostgreSQL URL (or set in `.env`) |
| `--api-key` | Shared API key fallback (or `LANGSMITH_API_KEY` env var) |
| `--source-api-key` | Source API key for cross-org (or `LANGSMITH_SOURCE_API_KEY`) |
| `--target-api-key` | Target API key for cross-org (or `LANGSMITH_TARGET_API_KEY`) |
| `--database-url` | PostgreSQL URL (or `DATABASE_URL` env var) |
| `--export-json FILE` | Export threads to JSON file |
| `--export-postgres` | Export threads to PostgreSQL database |
| `--import-json FILE` | Import threads from JSON file |
| `--migrate` | Migrate threads (export + import) |
| `--full` | Full migration (export + import + validate) |
| `--validate` | Compare source vs target thread counts |
| `--dry-run` | Simulation mode (no changes made) |
| `--test-single` | Process only one thread (for testing) |
| `--metadata-filter JSON` | Filter threads by metadata (JSON object) |
| `--history-limit N` | Max checkpoints per thread (default: all) |
| `--concurrency N` | Parallel thread fetches (default: 5) |
| `--legacy-terminal-node NODE` | Graph terminal node name for legacy imports (sets `next=[]`) |
| `--backup-file FILE` | Backup file path (default: `threads_backup.json`) |

## Import Strategies

The tool uses two import strategies, with automatic fallback:

### 1. Supersteps (preferred)
Replays state changes via `threads.create(supersteps=...)`, preserving the full checkpoint chain. This enables time-travel operations on the target deployment.

### 2. Legacy fallback
When supersteps fail (e.g., due to incompatible serialized objects in old checkpoints), the tool falls back to `create_thread()` + `update_thread_state()`. This preserves the final state but creates only a single checkpoint.

**Known issue with legacy import**: Without `--legacy-terminal-node`, threads imported in legacy mode may have `next=['SomeMiddleware.before_handler']` instead of `next=[]`, making them non-continuable. The flag fixes this by telling LangGraph which node "last ran".

## Key Bug Fixes (vs upstream)

### History pagination cursor format
The LangGraph Cloud API expects the `before` cursor in the format `{"configurable": {"checkpoint_id": "..."}}`, not `{"checkpoint_id": "..."}`. The incorrect format caused silent 500 errors on every page after the first, resulting in incomplete exports. This fix is critical for any thread with more than 100 checkpoints.

### JSON parsing of agent messages
Agent messages may contain unescaped control characters (e.g., `\n` inside JSON strings). The exporter now uses `strict=False` when loading JSON backups to handle these correctly.

## PostgreSQL Schema

When using `--export-postgres`, the tool creates:

```sql
-- Threads table
CREATE TABLE langgraph_threads (
id SERIAL PRIMARY KEY,
thread_id VARCHAR(255) UNIQUE NOT NULL,
Expand All @@ -160,7 +195,6 @@ CREATE TABLE langgraph_threads (
exported_at TIMESTAMP DEFAULT NOW()
);

-- Checkpoints table
CREATE TABLE langgraph_checkpoints (
id SERIAL PRIMARY KEY,
thread_id VARCHAR(255) REFERENCES langgraph_threads(thread_id),
Expand All @@ -173,81 +207,50 @@ CREATE TABLE langgraph_checkpoints (
);
```

## Example Output

```
╭────────────────────────────────────────╮
│ 🔄 LangGraph Threads Export Tool │
╰────────────────────────────────────────╯

╭─────────────────────────────────────────╮
│ Phase 1: Export threads from source │
╰─────────────────────────────────────────╯
✓ 66 threads found
✓ JSON backup saved: threads_backup.json
✓ Size: 29.30 MB
✓ Total checkpoints exported: 842
✓ PostgreSQL: 66 threads, 842 checkpoints
```

## Important Notes

### Authentication

If your LangGraph deployment uses custom authentication (e.g., Auth0), you may need to temporarily disable it during export:

```json
// langgraph.json - temporarily set auth to null
{
"auth": null
}
```

Remember to re-enable authentication after!

### Multi-tenancy

The tool preserves `metadata.owner`, so each user will only see their own threads after migration.

### Rate Limiting

Built-in delays (0.2-0.3s) prevent API overload. For large exports (1000+ threads), consider running during off-peak hours.

## Troubleshooting

| Error | Solution |
|-------|----------|
| `PermissionDeniedError` | Use Service Key (`lsv2_sk_...`), not Personal Token |
| `ConflictError (409)` | Thread already exists (automatically skipped) |
| `asyncpg not installed` | Run `pip install asyncpg` for PostgreSQL support |

## Project Structure

```
langgraph-threads-migration/
├── migrate_threads.py # CLI entry point
├── langgraph_export/ # Main package
├── migrate_threads.py # CLI entry point (Rich progress bars)
├── langgraph_export/
│ ├── __init__.py
│ ├── client.py # LangGraph SDK wrapper
│ ├── migrator.py # Thread migration orchestrator
│ ├── client.py # LangGraph SDK wrapper (per-page retry, cursor fix)
│ ├── migrator.py # Migration orchestrator (concurrent fetch, supersteps, legacy)
│ ├── models.py # SQLAlchemy models for PostgreSQL
│ └── exporters/ # Export backends
│ └── exporters/
│ ├── base.py # Abstract base exporter
│ ├── json_exporter.py # JSON file export
│ ├── json_exporter.py # Streaming JSON export
│ └── postgres_exporter.py # PostgreSQL export
├── requirements.txt
├── .env.example
└── LICENSE
```

## Troubleshooting

| Error | Solution |
|-------|----------|
| `PermissionDeniedError` | Use Service Key (`lsv2_sk_...`), not Personal Token |
| `ConflictError (409)` | Thread already exists on target (automatically skipped) |
| `asyncpg not installed` | Run `pip install asyncpg` for PostgreSQL support |
| `KeyError: 'configurable'` | Server-side error from wrong pagination cursor — fixed in this fork |
| `next != []` after import | Use `--legacy-terminal-node` with your graph's terminal node |

## Important Notes

### Authentication
If your LangGraph deployment uses custom authentication, you may need to temporarily disable it during export.

### Multi-tenancy
The tool preserves thread metadata including `owner`, so each user will only see their own threads after migration.

### Rate Limiting
Built-in delays (0.1s) between API calls prevent overload. Failed calls are retried up to 3 times with exponential backoff + jitter.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Acknowledgments

Built for the [LangGraph](https://github.com/langchain-ai/langgraph) community.
Loading