-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Part of: #408
Story: Cluster Migrate Script (Seed Cluster from Single Server)
Part of: #408
[Conversation Reference: "cluster migrate is to seed a cluster with the initial state of a working local server"]
Story Overview
Objective: Create a script that seeds a new cluster with the complete state of a working standalone CIDX server. This orchestrates: running the SQLite-to-PostgreSQL data migration (Story 10), copying golden repo files to shared ONTAP storage, and converting the standalone server into the first node of the cluster.
User Value: An existing production CIDX server can be converted to a cluster without downtime beyond a maintenance window. All data, repositories, indexes, and configuration are preserved.
Acceptance Criteria
AC1: End-to-End Migration Orchestration
Scenario: The script migrates a standalone server to become the first cluster node.
Given a working standalone CIDX server with data in SQLite and golden repos on local disk
When the cluster-migrate script is run
Then it stops the CIDX server service
And it runs the SQLite-to-PostgreSQL data migration tool (Story 10)
And it copies golden-repos/ and .versioned/ to the shared NFS mount
And it updates alias JSON files to reflect new NFS-based paths
And it updates config.json for cluster mode
And it restarts the CIDX server in cluster mode
And the server comes up with all previous data and repos accessibleTechnical Requirements:
- Shell script:
scripts/cluster-migrate.sh - Prerequisite check: cluster-join.sh must have been run first (NFS mount, PostgreSQL configured)
- Stop server:
sudo systemctl stop cidx-server - Run data migration:
python3 -m code_indexer.server.tools.migrate_to_postgres - Copy repos:
rsync -afrom local golden-repos/ to NFS mount - Update alias JSON target_path values for new base paths
- Start server:
sudo systemctl start cidx-server - Verification step after restart
AC2: Golden Repo File Migration to Shared Storage
Scenario: All golden repo files are moved to the ONTAP FSx NFS mount.
Given golden-repos/ contains cloned repositories and .versioned/ contains snapshots
When the file migration runs
Then all files under golden-repos/ are copied to <nfs_mount>/golden-repos/
And all files under .versioned/ are copied to <nfs_mount>/.versioned/
And all alias JSON files are copied and updated with new paths
And .code-indexer/index/ directories (vector indexes) are copied
And file permissions are preserved
And rsync is used for efficient, resumable copyTechnical Requirements:
-
rsync -av --progressfor copy with progress reporting - Source:
~/.cidx-server/golden-repos/and~/.cidx-server/.versioned/ - Destination:
<nfs_mount>/golden-repos/and<nfs_mount>/.versioned/ - Resumable: if interrupted, re-run copies only changed files
- Space check: verify NFS mount has sufficient free space before starting
AC3: Alias JSON Path Update
Scenario: Alias JSON files are updated to reflect the new NFS-based paths.
Given alias JSON files contain target_path pointing to local filesystem paths
When the migration updates paths
Then all target_path values are rewritten to use the NFS mount base path
And the old local path prefix is replaced with the NFS mount path
And the JSON structure is otherwise unchangedTechnical Requirements:
- Scan all
*.jsonfiles in<nfs_mount>/golden-repos/ - Replace path prefix:
~/.cidx-server/-><nfs_mount>/ - Validate JSON after update (no corruption)
- Backup original JSON files before modification
AC4: Pre-Migration Validation
Scenario: The script validates prerequisites before starting.
Given the operator runs the cluster-migrate script
When pre-migration checks run
Then it verifies cluster-join.sh has been run (config.json has cluster settings)
And it verifies NFS mount is active and writable
And it verifies PostgreSQL is reachable
And it verifies the CIDX server is stopped (not running during migration)
And it verifies sufficient disk space on NFS mount
And if any check fails, it reports the issue and exits without migratingTechnical Requirements:
- Check config.json has
storage_mode: "postgres"(cluster-join was run) - Check NFS mount:
mountpoint -q <mount_point> - Check PostgreSQL: test connection
- Check server stopped:
systemctl is-active cidx-serverreturns "inactive" - Check disk space: compare local data size vs NFS free space
- Exit with clear message on any failure
AC5: Post-Migration Verification
Scenario: The script verifies the migration was successful.
Given the migration has completed
When the post-migration verification runs
Then it starts the CIDX server
And it waits for the server to be healthy (health endpoint returns 200)
And it lists repositories via API and verifies all repos are accessible
And it runs a test query against one repository to verify indexes work
And it reports the verification resultsTechnical Requirements:
- Start server:
sudo systemctl start cidx-server - Health check: poll
/healthendpoint until ready (30s timeout) - List repos: call list_repositories API
- Test query: simple search_code query against first repo
- Report: "Migration verified: N repos accessible, query test passed"
AC6: Rollback Capability
Scenario: If migration fails, the standalone server can be restored.
Given the migration fails at any step
When the operator wants to rollback
Then the original SQLite databases are unchanged (read-only during migration)
And the original config.json is backed up as config.json.pre-cluster
And restoring config.json.pre-cluster and restarting returns to standalone mode
And the script provides rollback instructions on failureTechnical Requirements:
- SQLite databases read during migration, never modified
- Config backup:
config.json.pre-cluster - On failure: print rollback instructions
- Local golden-repos/ preserved (copied to NFS, not moved)
Implementation Status
- Core implementation complete
- Unit tests passing
- Integration tests passing
- E2E tests passing
- Code review approved
- Manual E2E testing completed
- Documentation updated
Technical Implementation Details
File Structure
scripts/
cluster-migrate.sh # Main migration orchestration script
Script Flow
1. Pre-migration validation
a. Check config.json has cluster settings
b. Check NFS mount is active
c. Check PostgreSQL is reachable
d. Check server is stopped
e. Check disk space
2. Backup config.json -> config.json.pre-cluster
3. Run SQLite-to-PostgreSQL data migration (Story 10)
4. Copy golden-repos/ to NFS mount (rsync)
5. Copy .versioned/ to NFS mount (rsync)
6. Update alias JSON paths
7. Start server in cluster mode
8. Post-migration verification
9. Print summary
Output Example
CIDX Cluster Migration
=======================
Pre-checks:
Cluster config: OK (storage_mode: postgres)
NFS mount: OK (/mnt/cidx-shared, 450GB free)
PostgreSQL: OK (postgresql://cidx@pg-host:5432/cidx)
Server stopped: OK
Step 1/5: SQLite to PostgreSQL data migration
Migrating users... 45 rows OK
Migrating global_repos... 12 rows OK
[...]
Validation: PASSED
Step 2/5: Copy golden repos to shared storage
rsync golden-repos/ -> /mnt/cidx-shared/golden-repos/
12 repositories, 8.4 GB total ... done
Step 3/5: Copy versioned snapshots
rsync .versioned/ -> /mnt/cidx-shared/.versioned/
12 snapshots, 8.2 GB total ... done
Step 4/5: Update alias JSON paths
Updated 12 alias files
Step 5/5: Post-migration verification
Server startup: OK (healthy in 8s)
Repositories: 12/12 accessible
Query test: OK (search_code returned results)
Migration complete!
This server is now the first node of the cluster.
To add more nodes: run 'scripts/cluster-join.sh' on each new server.
Rollback (if needed):
sudo systemctl stop cidx-server
cp ~/.cidx-server/config.json.pre-cluster ~/.cidx-server/config.json
sudo systemctl start cidx-server
Testing Requirements
- Automated: Pre-migration validation catches missing prerequisites.
- Automated: Alias JSON path rewriting works correctly.
- Automated: Rollback instructions are correct.
- Manual E2E: Run full migration on a test standalone server with real data. Verify all repos accessible, queries work, dashboard shows data in cluster mode. Test rollback by restoring config.
Definition of Done
- End-to-end migration orchestration script operational
- SQLite-to-PostgreSQL data migration invoked successfully
- Golden repo files copied to NFS mount via rsync
- Alias JSON paths updated for NFS base
- Pre-migration validation catches all prerequisites
- Post-migration verification confirms repos and queries work
- Rollback path documented and tested
- Script is idempotent (safe to re-run on partial failure)